


we 


Fd 


“yaa yd Me: 











REVIEW OF EDUCATIONAL RESEARCH 


Official Publication of the American Educational Research 
Association, a department of the National Education Association. 
The contents of the REVIEW are listed in the EDUCATION INDEX 





Volume V June, 1935 Number 3 





PsYCHOLOGICAL TESTS 
(Literature reviewed from January, 1932, to January, 1935) 


Prepared by the Committee on Psychological Tests: Gertrude Hildreth, Willard C. 
Olson, Herbert Toops, Goodwin Watson, and Harry J. Baker, Chairman; with the 
cooperation of G. Frederic Kuder. 


TABLE OF CONTENTS 
Chapter 


Introduction 


I. Intelligence and Its Measurement......................... 187 
Harry J. Baker, Detroit Public Schools, Detroit, Michigan. 


Il. Applications of Intelligence Testing....................... 199 


Gertrupe Hiwpretu, Teachers College, Columbia University, New 
York, New York. 


a 215 


Hersert Toors and G. Frepertc Kuper, Ohio State University, 
Columbus, Ohio. “ 


IV. Test Construction and Statistical Interpretation............. 229 


Hersert Toops and G. Freperic Kuper, Ohio State University, 
Columbus, Ohio. 


4 V. General Survey of the Field of Character and Personality 


Rk 5 sltRhe siane cic s buns. wie vole edible bide Reso 242 
Wiiarp C. Oxson, University of Michigan, Ann Arbor, Michigan. 


VI. Mental Hygiene and Emotional Adjustment................ 245 


Goopwin Watson, Teachers College, Columbia University, New York, 
New York. 


VII. Social Attitudes 259 


SCP ePeweeseo ee ese ese eevee se stovceds weesnverenee 


Goopwin Watson, Teachers College, Columbia University, New York, 
New York. 





og VIII. Measures of Character and Personality through Conduct and 
REESE SSE ESRD Fe a eee 273 
Witarp C. Orson, University of Michigan, Ann Arbor, Michigan. 
q Copyright, 1935 
a By National Education Association 
H Washington, D. C. 





All Rights Reserved 185 











INTRODUCTION 


Is vais numper of the Review of Educational Research the topic of 
psychological tests covers tests of intelligence, aptitude, personality, and 
character. Reviews of these topics in the first cycle of the Review proved 
to be so voluminous that separate numbers were issued on tests of per- 
sonality and character in June, 1932, and on tests of intelligence and 
aptitude in October, 1932. The present issue covers the three years 1932, 
1933, and 1934; but even in this brief period the task of selecting impor. 
tant reports of research has been arduous, and literally thousands of them 
have necessarily been eliminated. 

A review of psychological tests offers certain unique problems of selec. 
tion which are difficult to treat satisfactorily. The discussions should be 
concerned primarily with the nature of the traits to be tested. But the 
nature of psychological traits are partly matters of rational deduction and 
philosophical formulation, and as such cannot be expressed as standard 
deviations or as coefficients of correlation. Although statistical devices are 
convenient vehicles for evaluating research, theories of trait constitution 
cannot be omitted or ignored in reports of research. 

A second difficulty is that of presenting reports of research in the light 
of valid statistical formulas, without placing too much or too little emphasis 
upon statistical validity. In response to demands for greater emphasis 
upon the importance of statistical procedures, Dr. Toops has prepared a 
chapter of this issue dealing with that topic. His critical review of this 
field should be of interest to workers in fields of measurement other than 
psychological tests. Research confronts on one hand the danger of making 
deductions from data which are not statistically valid, and on the other 
that of becoming top-heavy and cumbersome with the necessary technics 
of validation. 

Critical evaluations of psychological tests cannot be limited to the tests 
themselves, but must consider applications of results to specific situations, 
groups, and populations. Therefore it is practically impossible to consider 
psychological tests without reporting on such topics as sex differences, the 
efficiency of special instructional methods, the phenomena of mental senes- 
cence, or any topic susceptible to psychological measurement. In these 
discussions the fields of test applications border on those of instruction, 
of social differences, in fact on the entire range of educational and psycho- 
logical processes. These marginal fields are being discussed in this Review 
primarily from the standpoint of test technic and secondarily from the 
practical, utilitarian application of results. 

In the preparation of this Review special mention should be made of 
the contributions of Dr. Olson, who not only was assigned duties as sub- 
chairman for the tests of personality and character, but who also prepared 
a substantial number of bibliographies and abstracts for all members 0! 
the committee. Harry J. Baker, Chairman, 

Committee on Psychological Tests. 
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CHAPTER I 


Intelligence and Its Measurement 


Sources of general information about intelligence testing are given in 
Pintner’s annual summaries (87) which cover theories, new tests and 
texts, and results of studies and investigations. A comprehensive bibli- 
ography has been prepared by Hildreth (45). General texts on intelligence 
and its measurement are offered by Boynton (14), by Garrett and Schneck 
(34), and a second edition in 1931 by Pintner (86). 

This chapter will review the nature of intelligence in its general and 
special aspects; the clinical interpretation of functions; mental growth 
and changes; test standardization and variability; new tests; and suggested 
topics for further research. 


Nature of Intelligence 


General ability—For some years a debate has been in progress between 
Spearman and Thorndike as to the nature of general intelligence. Spear- 
man (104) offered a hypothesis that there is a general factor “G” which 
permeates all mental activity and invariably produces a positive correla- 
tion between any two types of mental tests. A second “S” factor (actually 
a group of “S” or specific factors) exists which comes to light, depending 
upon the unique type of mental activity being tested. Thorndike proposed a 
series of traits more or less closely related to one another rather than a strong 
central “G” factor. Spearman (105) recently discussed this point by stating 
that an ability or trait which seems to be unitary may actually be a com- 
posite hidden in the various traits, and that the only safe procedure is not 
to assume what the tests measure but to study their intercorrelations. 

Bruckner (18) disagreed with Thorndike (113) on the assumption that 
only a quantitative factor is the fundamental cause of differences in intelli- 
gence such as is measured on the CAVD tests. Bruckner held that qualita- 
tive differences in the original endowment of associative power are more 
important bases, and that Thorndike’s tests involve the danger of splitting 
intelligence into partial segments without ever getting a grasp of the total 
mental constitution. 

R. C. Tryon (118) examined ten important investigations with one hun- 
dred or more subjects by the method of tetrad differences and concluded 
that since the tetrads dispersed around zero, the Spearman two-factor 
theory was not as consistent as a multiple-factor theory of intelligence. 
Spearman (105) answered, claiming irrelevancy, incorrect interpretation, 
and disregard for the group-factor concept. In two studies Piéron (84, 85) 
deduced the existence of four types of intelligence: verbal, numerical, - 
logical, and one of “common sense.” He emphasized that the intercorrela- 
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tions of these types are low which is also true between comprehension and 
invention. These latter two types, he believes, are the most common com- 
ponents of many intelligence tests. He contended that a whole series of 
mental tasks should be given which apply to all phases of learning rather 
than to what is ordinarily found in intelligence tests. 

Thurstone (116) treated the theories of intelligence in somewhat the 
same manner as Piéron. He showed that Spearman’s general factor and a 
second factor that is specific to each test or variable are actually only 
special cases within a multiple-factor theory. Eventually it will be possible 
to identify several mental abilities which reveal themselves as distinct and 
among them will probably be verbal ability, perceptual relations, and 
arithmetical ability. In the same discussion Thurstone reported the appli- 
cation of the multiple-factor theory to psychotic symptoms with the dis. 
covery of five clusters of symptoms, and also to vocational choices in which 
he found that there is a relatively small number of types of vocational 
interest groups. 

These investigations with the improved statistical methods of analysis 
hopefully point to a better understanding of the nature of intelligence. 

Special abilities—Investigations in the field of special abilities and dis- 
abilities have been comparatively meager, although certain new tests 
show some new attempts in this direction. When the analytical procedures 
proposed by Thurstone have better localized the special areas of intelli- 
gence, new tests will probably be developed to measure them. Many of 
the recent investigations on specific tests deal with tests of performance. 

McElwee (63) analyzed the free association test included in the Stan- 
ford-Binet on 200 school children and found few association sequences, 
but in some cases children gave groups of objects in the same class. Hutt 
(48) offered a revision of the Kohs’ Block Design Test, showing that of 
the three factors, success, speed, and moves, the third is of no value in 
increasing the diagnostic value or the validity of the test. Line and Ford 
(58) examined 705 children with the Knox Cube Test and found that the ; 
with chronological age was low, with Binet M.A. was .58, and with school 
grade was .39. This test had low reliability and only four levels of difficulty. 
Edds (29) found very little in common between tests of verbal and non- 
verbal ability in measuring 53 college and 140 high-school students. 
Mental ages were developed for the separate tests or pages of the Detroit 
Group Intelligence Tests by Baker (7), and when taken with similar 
results for tests of mechanical aptitude afford a comprehensive profile of 
special abilities and disabilities. Wechsler (123) proposed a similar use 
of the subparts of the Army Alpha Test. 

Performance and non-language tests—Several new tests in these fields 
will be mentioned later. Research projects on applications of such tests are 
few in number. Feinberg (31) reported correlations of Stanford-Binet 
and Pintner-Paterson tests on 807 children as being moderately positive 
with lower results for superior children and adults. Babcock (3, 5), using 
these same tests on adult foreigners, found them rating lower on the Stan- 
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ford-Binet than on performance. Lincoln (57) reported high reliability 
on odd-even items and low Binet correlation for the Lincoln Hollow Square 
Form Board. 

Four hundred children, grades 4 to 8, were tested by Armstrong (1) with 
Army Beta, short Army Performance, and Otis Group tests. Although he 
found an r of .633 between Beta and Otis, he recommended a re-standardi- 
zation of Beta up to thirteen years of age. Army Performance has a more 
satisfactory standardization, but a lower correlation of .629 with Otis. 
He recommended Army Performance for clinical practice. 

Senour (100) tested children from foreign families in East Chicago 
with Haggerty Delta 2 and Pintner Non-Language Mental Test and con- 
cluded that language handicap was rather serious in the Haggerty Test. 
Mowrer (75) reported on the performance of 83 children aged twenty-four 
to sixty-four months on 18 items of the Merrill-Palmer Scale. Bradbury 
(15) applied the Descoeudres Performance Test to nursery-school chil- 


dren, and reported on test administration and validation, with an r of .56 
with Stanford-Binet. 


Clinical Interpretations 


In the clinical interpretations of intelligence, attempts have been made 
to analyze responses to tests and testing situations as well as to report 
the gross responses in terms of intelligence quotient or mental age. Various 
phases will be considered briefly here, since there tends to be an over- 
lapping with test applications in Chapter II and with special talents and 
defects reported above. 

Speed—Beck (9, 10) reported two studies on the factor of speed. He 
(10) reviewed thirty-four studies which showed a correlation range from 
~.32 to .90 between speed and simple discriminative reaction; —.03 to .53 
with serial reaction; .14 to .32 with speed of reading, etc.—generally low 
correlations. In his other study he (9) concluded that speed, as measured 
in the usual test content, is not a measure of intelligence. C. Tryon and 
Jones (117) interpreted a low correlation between speed, as measured by 
screen exposure of simple narrative material, and altitude on the Thorn- 
dike CAVD tests as showing no marked community of function between 
speed and altitude or level of intelligence. Freeman (32) found a few, 
though important, number of cases among 117 university students whose 
scores on the Ohio State University Psychological Test were influenced 
by varying the time limits. Graf (38) found considerable shifting of rela- 
tive rank orders of 100 policemen in the early stages of a series of mental 
tests. Constancy of rank tended to be reached only when about 80 percent 
of the total time had elapsed which would be necessary to complete the 
entire test. Sutherland (108) concluded that speed was a factor in intelli- 
gence only when the material was easy. Line and Kaplan (59) found that 
speed is not related to mental age, that it is subject to improvement with 
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practice, and that the alertness of young bright pupils is probably stimu. 
lated by competition with older pupils in their classes. 

Qualitative phases—The manner of test response is constantly in need 
of observation. Reymert and Hartman (93) found the correlation between 
early and later trials on the Knox Cube Test raised from .36 to .40 when 
the most variable first trial was eliminated from the computations. They 
found no constant relationship between procedure and intelligence and 
that the former can be judged only by observation. Tulchin (119) and 
Blumenfeld (12) emphasized the importance of such attendant factors as 
fatigue, embarrassment, stability, social and economic status, and language 
handicap. Medrow (67) reported that the methods and attitude with which 
subjects carried out form-board tests reflected personality, speed, com. 
prehension, concentration, precision, thoroughness, and similar traits. 
Nass (76) devised a test of constructing designs by twining string around 
a board with about 100 wooden pegs. In making entire copies of designs 
as well as in completing a design when only a part was shown there was 
a surprising uniformity of kinds of errors. Figures to be copied could not 
be grasped as wholes but had to be analyzed into parts; and awareness 
of partial success was necessary for further completion of the figures. 

Inter pretation in psychopathic cases—The unique and unusual responses 
of psychopathic patients to mental test situations have furnished many valu. 
able diagnostic clues. Schott (99) gave two or more administrations of the 
Stanford-Binet Test to neuro-psychiatric cases and found no relationship 
between variability and I.Q. level. Correlations between 1.Q.’s on firs: 
and second trial was approximately .9 but the tests were used mainly as a 
barometer of the upward or downward trend of mental functioning. Bab- 
cock (4) stressed the need of studying psychopathic individuals as to the 
time of responses to familiar test material, as to their impressions of new 
test data, as to a measure of educational level, and as to variability between 
two test ratings-which reflects a disturbed status of sensory functions. 

Binder (11) studied psychopathic, neurotic, and normal cases on the 
Rorschach Test. He discovered that relatively much more importance 
must be attached to the central mood reactions in psychopathic personali- 
ties than in neurotic, and more in neurotic than in normal; and conversely. 
adaptivity to the peripheral or environmental stimulus plays a lesser role 
in the psychopathic than in the neurotic, and less in the neurotic than in 
the normal. Takamine (109) devised a complex performance test which 
required coordination in three different types of reactions simultaneously. 
The right hand turned down the crank to roll down a band of geometrical 
figures in an irregular order, so that these figures appear and disappear 
in a constant stream in which the subject must count all of a specified 
kind, and tap with a counter for the total number of all figures. On this 
test it was found that normal children decrease their errors with age, with 
a remarkable development in ability after the twelfth year. Girls were 
faster than boys but made four times as many errors. Motormen inclined 
to accidents made more errors than normal persons. Mental defectives 


190 





nan am A 6 Be -tehe A See 


finished in quick time but with many errors. In paralytic dementia, effi- 
ciency decreased to 70 percent, to 34 percent, and finally to complete 
inability. Dementia praecox patients remained at a relatively high level 
difficult to differentiate from neurasthenics. Traumatic neurosis patients 
had a low index of efficiency which became progressively worse. From 
these sample studies it is evident that much valuable clinical information 
can be gleaned from observing test reactions, particularly in performance 
tests. 


Mental Growth 


The problem of mental growth has always been complicated by the 
varying content of tests at the several age levels, by differing degrees of 
test difficulty and by inadequate knowledge of growth curves, particularly 
at the adult levels. For a more complete discussion of this topic the reader 
is referred to the report of Stoddard’s committee (106). This section con- 
siders mental growth in general, the constancy of the I.Q. within various 
groups, and the characteristics of adult mentality. 

General studies—Jordan (49) gave various forms of the National In- 
telligence Tests to 183 children for six semi-annual intervals. The I.Q.’s 
remained fairly constant and growth curves were practically parallel at 
all 1.Q. levels. Takamine (110) examined the same group of 109 Japanese 
children for the six-year compulsory school course. Ninety-two percent of 
all individuals’ abilities were predicted with reasonable accuracy from the 
initial tests. Wilcocks (129) tested 16,574 pupils in South Africa of Euro- 
pean descent from twelve to sixteen years of age and found an almost 
perfectly normal distribution of ability with standard deviations increas- 
ing slightly with chronological age. Growth could be expressed in terms 
of a hyperbolic equation. Wheeler (128) examined the mental growth 
of dull Italian children for four consecutive years with the Dearborn 
Group Intelligence Tests. He found that at time of school entrance they 
were retarded nearly a year mentally. This increased to over two and 
one-half years at the age of eleven years. His data showed that dull chil- 
dren reach their mental maturity earlier than normal and superior children. 

Constancy of the I1.Q.—Nemzek (78) reviewed 247 titles and found test 
reliabilities running mostly between .75 and .95, so that charges of gross 
unreliability are unwarranted. P. Cattell (19) found a fictitious rise of 
four or five points in 1.Q. due to familiarity with test material repeated 
within three or four months, but an insignificant gain when the interval 
between tests was six months or longer. R. R. Brown (17) reported twice 
as much variability in 1.Q.’s when the interval between tests was from five 
to nine years than when it was less than two years, and that 25 cases in 100 
change their 1.Q. more than 15 points after a seven-year interval. Lincoln 
and Wadleigh (55) compared I.Q.’s of 150 children on three well-known 
group tests and found a shift of five points or less in 31.3 percent of the 
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cases, and a shift of ten points or more occurred in 36.9 percent of the 
cases. This change is only approximately one point or two greater than 
for individual tests. 

Preschool studies—There have been relatively more studies of test 
variability and mental growth changes at the preschool level than in any 
other age group. Wellman (124, 125, 126) made the most extensive studies 
in this field. In one of these studies she (126) found striking improvements 
in the I.Q.’s, especially by those below average. Gains in test scores are 
associated with preschool attendance. In her most recent report she (125) 
found that children who remained in the University system were higher 
in I.Q. by about 8 points at the age of eight and one-half years than those 
who had transferred to other schools. Non-preschool children made no 
change in I.Q. but matched preschool children made substantial gains. 


It would be better to think of intelligence test results as representing intellectual 
status at the moment and to proceed cautiously on the problem of precise prediction 
until more is known about variations under differing conditions. Certainly intelligence 
cannot be regarded as static; it should be regarded in terms of growth rather than as 
a fixed quantity. Nevertheless, there are probably definite constitutional limits within 
which growth may be altered (125: 80-81). 


Hallowell (42) tested 438 children from three to forty-seven months 
of age and retested at ages from one to eight years. Test-retest variations 
of less than 5 points occurred in 48 percent to 55 percent of the cases, and 
10-point variations included from 78 percent to 88 percent, mostly in the 
direction of improvement which he ascribed to change of environment 
and to development of language. Furfey and Muehlenbein (33) found that 
scores on the Linfert-Hierholzer scale administered in the second year 
of life did not predict Stanford-Binet scores four years later. 

Additional experimental studies on other mental phenomena of pre- 
school children were conducted by Shacter (101) who described a method 
for measuring~sustained attention, Poyntz (89) on the efficacy of visual 
and auditory distractions, Riissel (97) on form comprehension of two- 
to five-year-old children, K. A. Miles (70) on sustained visual fixation, 
Hurlock and Newmark (47) on memory span, and Grigsby (41) on the 
development of concepts of relationship as evidenced by their expressive 
ability. 

The mentally gifted and superior—Nemzek (79, 80) reported two 
studies on the I.Q. constancy of the mentally gifted. In one study of 52 
children on the Herring-Binet, test-retest ratings correlated .73 + .04. 
with a range in change of I.Q. —19 to 22, as contrasted with .98 + .01, 
and -3 to 8 respectively for average children. The average children were 
retested within twenty-four hours as against a one-year interval for the 
gifted. Lincoln (56) reexamined 92 children with 1.Q.’s of 119 or over 
at intervals from five to eight years. He found that 1.Q.’s tend to decrease 
and that girls’ are likely to decrease more than boys’. P. Cattell (20) 
found I.Q. trends in 288 superior children just the opposite of Lincoln's 
results and of two Stanford University investigations. 
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The feebleminded—Woodall (130) reported an average trend of —0.45 
points in 1.Q. per year in 497 institutionalized feebleminded from six to 
sixteen years in chronological age; but above sixteen years of .99 points 
per year. Statistically significant upward trends occurred in 6.2 percent 
of the cases and downward trends in 8.2 percent. Girls were more variable 
than boys, morons more variable than imbeciles, and imbeciles more than 
idiots. A similar trend in I.Q. by age was discovered by Hoakley (46) in 
550 cases. Fifty percent of all her cases were constant to within 5 points 
in 1.Q. and extreme variable cases showed various causative factors. 

Other groups—Gildea and Macoubrey (37) reported on 431 white 
patients of the Institute for Juvenile Research, Chicago, and found that 
77, or 18 percent, had 1.Q. changes of more than 10 points. They compared 
73 of these with a similar number whose I.Q.’s had changed 5 points or 
less. Degree of cooperation, attitude toward examiner, and speed showed 
no relationship to variability, and degrees of reflectiveness and attention 
were slightly more favorable in the group whose I.Q. advanced. Improve- 
ments in physical condition, in parental attitudes, and in symptomatic 
behavior were definitely associated with I.Q. variability. Richey (94) made 
a comparison of 204 children whose naso-pharyngeal condition was satis- 
factory with 100 needing attention, and with 104 who had had their 
physical corrections. Removal of tonsils and adenoids had no measurable 
effect upon the I.Q. McAlpin (62) tested all the negro children in the 
3A and 5A grades of two schools in Washington, D. C. The 3A children 
born in the District had an average 1.Q. of 98.1; those born outside, 92.1. 
Similar figures of 95.1 and 89.7 were found for the 5A grade. The author 
concluded that the favorable environment stimulated mental capacities 
in the District. 

Mental variability in adult levels and senescence—This topic is also 
discussed in Chapter II with respect to test applications. The chief studies 
have been made by W. R. Miles (68, 71, 72) who gave a series of psycho- 
logical tests to persons from ten to ninety years of age. He found a rise 
in certain mental functions up to middle life and with considerable decline 
at senescence. Visual acuteness showed a slight decline up to age forty- 
nine, and 48 percent at age seventy to eighty-nine. In rotating speed of a 
small hand-drill mechanism, ages eighteen to twenty-nine showed the 
maximum scores; in hand promptness, ages fifty to sixty-nine; in foot 
promptness, ages eighteen to forty-nine; in immediate memory and judg- 
ment for position in spatial relations, ages thirty to forty-nine; in good 
judgment on an intelligence test, ages eighteen to twenty-nine. In general, 
mental efficiency at senescence declined about one-half of the maximum, 
which was reached at or near the age of fifty years. Sorenson (103) tested 
641 university extension students ages sixteen to seventy with the Minne- 
sota College Aptitude Test and Reading Examination and found that the 
curve for vocabulary ability increased with chronological age but para- 
graph meaning remained relatively constant. Mental use or disuse were 
chief causes of changes in curves of adult ability. 
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Test Standardization and Variability 


A general discussion of statistical procedures is presented in Chapte: 
IV. However, a report seems necessary here on specific studies dealing 
with mental-test standardization and reliability. 

Test standardization—C. W. Brown, Bartelme, and Cox (16) presented 
a discussion of a scoring device for use with tests scaled according to the 
Thurstone absolute scaling technic. They used Gesell’s Developmental 
Schedule and the California First Year Mental Scale for sampling material 
and presented advantages and disadvantages of the method. Several inde- 
pendent investigations have been made on the standardization of items on 
the Stanford-Binet. Wallin.(122) and Phillips (83) showed the need for 
better standardization at its adult levels. Madden (66) found an increase 
rather than a decrease in the successes in year 9 over 8 for mentally slow 
children. Stoke (107) discovered that tests one and six of year 8 were 
more difficult than any other tests of 8 or 9. Test six in year 9 was found 
to be the easiest of that year, and Madden had found it to be second most 
difficult. Skalet (102) reported that tests involving interest in numbers, 
time, and geometrical form were relatively difficult, while tests of com- 
prehension were easy. The author recommended the reevaluation of the 
I.Q. in the light of test difficulty. Louden (61) studied the two Stanford 
vocabulary lists and found that List 1 is decidedly easier than List 2 at the 
lower mental levels, approximately equal at = fourteen-year level, and 
more difficult above that point. 

R. B. Cattell and Bristol (23) selected a new series of tests, inference 
stories, and puzzle boxes, and concluded that tests involving action and 
concrete material were most attractive to children but make the least 
demand upon intelligence; the best tests require either education of rela- 
tionships or effectiveness of immediate memory. Radosuska-Strzemecka 
(91) standardized defining 15 words out of a list of 100 on subjects five 
to nineteen years of age. He proposed a seven-step scale of definitions 
ranging from pointing to the object to complete generalization. Perkins 
(81) studied Stanford-Binet test performance with respect to brightness. 
Tests IX-3, IX-1, XII-7, and XIV-4 are definitely “experience” tests, since 
more retarded children than superior children are able to pass them. 
Tests VIII-3, VIII-4, XII-8, X-2, X-3, XII-4, and XII-6 are passed more 
by superior than by retarded children. 

Other tests—Vernon (121) reported a bibliography of eighty-four titles 
on test diagnosis for aptitudes, intelligence, and mental defects. He criti- 
cized the standardization of the Rorschach ink-blot test but was favorable 
to its qualitative significance. Lossagk (60), Levitov (54), Elderton (30), 
Bowers (13), and Cavalcanti (24) reported standardizations of visual 
imagery tests and spatial relationships. Lendzion (53) and Lahy (52) 
standardized tests for arranged numbers in serial order, with a positive 
correlation between short time of performance and good working methods. 
Wreschner (131) experimented with tests of judgment of characteristics 
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common to several objects for children six or seven years of age. Attention 
was easily distracted, the field of vision small, abstract thinking rare, and 
perseveration strong. 

Peterson and Peterson (82) reported on separate answer strips to allow 
for repeated use of blanks, and the Perfo-Scorer, the Thermo-Scorer, and 
the Chemo-Scorer with sensitive ink as devices to reduce the cost of test 
material. 

Arthur (2) presented the process of standardization of her Point Per- 
formance Scale. Thomas (111) standardized the Phillips Group Scale on 
5,900 children in Perth, West Australia, with the higher social group con- 
sistently better than the lower social group. Coutinho (27) added to the 
standardization of the Pernambucan revision of the Stanford-Binet for 
children three or four years of age. McElwee (64) presented norms for 
ages five to thirteen on the Ellis Memory Test for Objects. 

Test comparability—Several investigations have attempted to compare 
results of various intelligence tests, and to explain the differences. The 
writer has found that the discrepancy between group intelligence and 
Stanford-Binet mental ages is due to the fact that the group tests measure 
“area” of intellect and the Binet is more a test of “altitude” according 
to Thorndike’s description. Mentally retarded children earn higher and 
mentally accelerated children earn lower group than Binet mental ages 
on account of greater “area” scores for the retarded arising from greater 
chronological ages, and vice versa. This difference between tests may be 
approximated by deducting two months mental age for each year’s differ- 
ence between chronological age and group mental age. For example, a 
retarded pupil fifteen years of age testing nine years on the group test 
will tend to earn about eight years mental age on the Stanford-Binet; 
a mentally accelerated child six years of age, testing nine years on group 
tests. will tend to have a Binet mental age of nine years and six months. 
It will be noted that the identical group score for a mental age of nine 
years will be equivalent to Binet eight years in one instance and nine 
years six months in the other. P. Cattell (22) found both theoretically 
and empirically that there is a constant difference between the Binet 1.Q. 
and Otis 1.Q. at the extremes, especially the upper extreme. McElwee (65) 
found a correlation of .717 on 45 subnormal children between the Goode- 
nough Intelligence Test and Stanford-Binet. 

Group test equivalent scores—A manual for determining the equivalence 
of mental age obtained from group intelligence tests has been prepared 
by Runnels (96). Moore and Trafton (74) reported on equating scores 
for 235 Mount Holyoke freshmen on Terman Group A, Otis S-A, Army 
Alpha Form 8, and Miller Form A. A comparison was made by Miller 
(73) on results of six annual individual examinations of 160 subjects 
tested with five group examinations on the same day. The group results 
compare very favorably with results of the repeated individual examina- 
tions. Thomson (112) reported on two English investigations on stand- - 
ardization of group intelligence tests. Graf (39) explained discrepancies 
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between German tests as due to differences in content. The Beltz test un. 
duly stressed arithmetic; the Bobertag-Hylla test overemphasized logical 
thinking. 

Durling (28) and Updegraff (120) arrived at different conclusions 
as to the reliability of Stanford-Binet on young children. Durling found 
them lower than Updegraff. Nemzek (77) found no consistent difference 
between Stanford and Herring I.Q.’s on 52 superior children. The question 
of a suitable C.A. divisor for Binet 1.Q.’s was considered by Rappaport 
(92), who found a decreasing I.Q. for sixteen years, the smallest change 
for fifteen years, and no criteria satisfied in using fourteen years. The 
Heinis’ personal constant as a suitable substitute for the I.Q. was urged 
by Hilden (44) and P. Cattell (21). 


New Tests and Revisions 


Group tests—The Cleveland Kindergarten Classification Test devised 
by Rockwell, Hawkins, and Connor (95) can be administered to groups 
of ten or twelve pupils at a time, and has ten subparts under the four 
heads of motor control, sense discrimination, English, and graphic expres. 
sion. A few parts must be administered individually. 

Pintner (88) has developed a group test in two forms (A and B) for 
grades 4 to 8 inclusive which requires about forty-four minutes to admin- 
ister, testing vocabulary, logical selection, arithmetical reasoning, best 
answer, number sequence, classification, opposites, and analogies. Henmon 
and Nelson (43) presented Forms A and B, one set for elementary grades 
3 to 8, the other for grades 7 to 12, with 90 items and a thirty-minute limit. 
Greene (40) devised the Michigan non-verbal series for ages five to twenty 
with four equivalent batteries for aiming, tapping, feature discrimination, 
and pencil maze. Norms are available on approximately 300 white persons 
with both sexes represented at each age level. — 

Wells (127) prepared a revised short form of Army Alpha which com- 
pares favorably with the original form, although only four of the eight 
subtests are used. Thurstone and Thurstone (115) continue to issue annual 
revisions of the American Council Psychological Examinations for high 
schools and colleges which require sixty minutes of work time, usually 
with four tests of sentence completion, arithmetical problems, geometrical 
analogies, and synonyms-antonyms. Cleeton (25) devised the Carnegie 
Mental Tests for high school and college. C. C. and W. R. Miles (68) 
standardized the Otis S-A Test as a fifteen-minute test. Sargent (98) re- 
ported an adaptation of the Otis Classification Test suitable for blind 
children. 

Individual tests—The most ambitious project in this field is by Gesell 
and others (35) who have prepared an atlas of infant behavior, a 900-page 
work of two volumes with 3,200 action photographs. The first volume is 
a normative series on posture and locomotion; early perception and pre- 
hension; perceptual, prehensory, and adaptive behavior. Volume two is a 
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naturalistic series of four boys and two girls of behavior patterns and 
episodes such as feeding, bathing, play, and sleep. Gesell, Thompson, and 
Amatruda (36) have presented an interpretation of the Atlas as to the 
genesis and growth of infant behavior. 

Bayley (8) devised the California first-year mental scale with a test 
schedule of items from numerous sources. Norms are available on about 
50 infants tested at an interval of about one year. Probst (90) standardized 
two forms of a general information test for children with 32 questions 
in 11 categories such as time, number, and simple mechanics. Kent (50, 
51) reported a written and an oral test for clinic use. The latter has norms 
on 500 cases over the range from six to fourteen years with three over- 
lapping point scales. 

Baker and Leland (6) announced the forthcoming publication of the 
Detroit Tests of Learning Aptitude, a series of nineteen point scales each 
standardized on about fifty children at each suitable age level, with the 
median of the mental ages affording a general mental age. This test covers 
a wide range of verbal, spatial, number, motor-manipulative tests, auditory 
and visual spans, orientation and social adjustment which are symptomatic 
of special abilities and disabilities. 

Cornell and Coxe (26) devised a Performance Ability Scale with seven 
tests including the Manikin-Profiie Test, the Block-Designs Test, the Pic- 
ture-Arrangement Test, the Digit-‘Symbol Test, the Memory-for-Designs 
Test, the Cube-Construction Test, and the Picture-Completion Test, which 
is designed to measure non-language ability. Descriptive material is given 
on the following groups: American-foreign, manual-verbal, verbal-mark- 
edly-handicapped-in-language, emotionally dull-volitional, social-non- 
social, extrovert-introvert. This scale offers a fine supplement to the usual 
verbal tests, and tests of this type are certain to have much wider use in 
clinical applications than they have had in the past. 

Foreign tests—There has been considerable activity in the development 
of foreign intelligence tests which will be mentioned by author and with- 
out citation of specific reference, according to countries. 


A 1932 Mental Survey Test was devised and used by the Scottish Council for Re- 
search in Education. Marshall standardized Spearman’s “Measure of Intelligence” on 
children of Perth, Australia, Hales reported a standardization of Army Alpha for 
Sydney, Australia. Chinese tests were reported by Hsiao who describes an intelligence 
test of multi-adaptability chiefly of number content, and by Ou-Ni-Lin who describes 
a Chinese version of Binet-Simon. South American activity at Pernambuco was typified 
by Aranjo on Ballard’s test, by Barerto on a test similar to Army Alpha and Dearborn 
Tests, Oliveira on an Alpha test, Campos on the Binet-Simon. German projects were 
represented by Norden on new revision of Bobertag’s version of Binet-Simon, by Hetzer 
and Koller with a series of four tests for the second year of life, by Schlotte on a test 
based on 1,000 dull children at Leipzig, and by Schlag with a traditional group test 
for elementary-school children. In France and Belgium Simon reported a five-minute 
examination for retarded children similar to Binet-Simon, and Frickx standardized 
the Simon P-V test on Brussels elementary children. Foucault, Piéron, Lahy, Decroly 
and Segers report various new test standardizations for the mentally gifted, and an 
adaptation of the Ballard test. 
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Summary and Suggested Research 


Research into the nature of intelligence has been facilitated by the 
development of many new tests of specific traits and by Thurstone’s methods 
of statistical analysis. Non-verbal and performance tests have been devised 
for many foreign speaking groups, and for other special groups such as 
the psychopathic. The influence of preschool attendance upon intelligence 
is still a question of debate since there seems to be considerable gain 
in test scores. Measurements in adult levels show that growth continues 
later in life, and with a decline in senescence more definitely expressed 
than in any earlier investigations. Scores between different intelligence 
tests and between group and individual tests have been equated. A consid. 
erable number of new tests has been developed which attempt to be more 
diagnostic of special abilities and disabilities. 

Further research is necessary in all of these fields and in the relation. 
ships of intelligence to specific educational progress, to vocational success, 
to personality deviations, and to many similar factors. 








CHAPTER II 
Applications of Intelligence Testing 





[ sreLuicence TESTING continues to serve the school, the court, the clinic, 
and the research laboratory in the study and adjustment of children and 
adults. Readiness for instruction is verified through the administration 
of intelligence tests in many schools. Waste resulting from school failure 
is transformed into school success when mental abilities are considered in 
school assignments. Predicting high-school success has become a para- 
mount issue in large high schools enrolling a cross-section of a hetero- 
geneous population, Scholarships are awarded on the basis of intellectual 
maturity shown by the candidate, maturity gauged in part through stand- 
ardized tests. Scholastic aptitude tests are regularly included in college 
admission requirements. 

Intelligence testing aids the psychologist, the mental hygienist, the 
clinician in determining the mental status of patients and subjects, in pre- 
dicting mental growth, in formulating therapeutic measures, in the dispo- 
sition of criminal and delinquent cases. In the courts there is an increasing 
tendency to suit the punitive measures to the culprit in proportion to his 
mental responsibility. Psychological service in all its aspects becomes 
more reliable and effective through the use of mental capacity measures. 

Personnel selection in industry, in civil service, in the professions, is 
facilitated by mental development and intelligence tests. From policeman 
to college professor, the vocational guidance expert inquires to what extent 
general mental ability is required in a given occupation, and to what extent 
the applicant has the requisite ability. For uniform ranking of all candi- 
dates the test technic proves indispensable in vocational guidance service. 
Candidates for professional training in nursing, medicine, teaching, take 
mental aptitude tests which constitute a part of the prognostic test battery 
used for selection. 

Intelligence testing has proved its value as a tool for the study of mental 
development from infancy to old age; for determining the interrelation- 
ships of intellectual qualities; for determining the distribution and central 
tendencies of intelligence in population groups, and the range of individual 
differences in age, sex, and racial groups; for establishing the relation 
between mental and physical qualities; and for studying the gifted and 
subnormal individuals in the general population. 

The effect of improved physical condition on thinking capacity, of vary- 
ing environmental factors on mental development, is tested in part through 
mental measurement of population groups. Social problems, immigration, 
sterilization, birth control, crime control are being studied through the . 
mediation of scientific mental aptitude measurement technics. 
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A more critical attitude toward intelligence measurement, as the out- 
come of continued experimentation, has resulted in more authoritative 
research findings, more sensible and intelligent interpretation of data. 
Testing instruments of improved reliability and validity are now available 
for research purposes. 

Conflicting results from similar research studies still persist due to 
variations in sampling, inadequate controls, lack of refinement in measur- 
ing instruments, inadequate or improper test standardization. 

Applications of intelligence testing will be reported in this chapter 
under the following captions: 


1. General Interpretations and Surveys 

2. Intelligence Testing for Scholastic Purposes 

3. Clinical Applications of Intelligence Testing 

4. Vocational Guidance 

5. Individual Differences in Intelligence and Mental Development 
6. Relation of Intelligence to Other Traits 


General Interpretations and Surveys 


In a comprehensive textbook for college students, Boynton (148) dis- 
cussed the nature of intelligence and mental measurement methods. Garret! 
and Schneck (200) prepared a book to be used as a text and laboratory 
manual for the training of students. Nihard (281) wrote a text for the 
initiation of teachers in the test method. Pressey and Pressey (301) revised 
an earlier edition of an introductory handbook in the use of tests. Webb 
and Shotwell (353) included material on intelligence and achievement 
testing from nursery school through the elementary grades in a textbook 
on measurement. Most of these publications contain extensive bibliog- 
raphies. A summary of books and articles on intelligence testing has been 
prepared each year by Pintner (296). Reymert (305) reviewed mental 
testing conducted in colleges, schools, and clinics during 1932. 

Thurstone (342) applied his multiple factor analysis theory to the study 
of mentality. He suggested that the vectors of mind are: verbal ability, 
perceptual relations, arithmetical ability. Wallis (351) made a critical 
examination of some concepts in the field of testing children. Colucci 
(166) evaluated the mental testing movement. Crawford (171) urged 
caution in the interpretation of test results; F. S. Freeman (198) com- 
mented on the improper use of psychological tests; and Zachry and Lloyd 
(367) suggested that the proper interpretation of intelligence test results 
required judgment, experience, and wide study of each individual case. 

The widest scale survey of school children yet reported was made by 
the Scottish Council for Research in Education (322). Uniform mental 
examinations were given to 87,498 children between the ages of ten and 
a half and eleven and a half, practically the entire population of such 
children in Scotland. Fick (189) reported a mental survey of 25 percent 
of the school children in the Union of South Africa. Wood (365) reported 
the aptitude and achievement testing results in a large number of inde- 
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pendent schools, and Woody (366) the testing program outcomes in a 
Michigan school system. Hildreth (216) prepared a bibliography cover- 
ing the entire field of mental and educational testing, listing about 3,500 
separate items. Kiesow (231) reported an effort in Italy to prepare a 
national collection of mental and physical tests, including apparatus, 
technics, tests, and questionnaires. 


Intelligence Testing for Scholastic Purposes 


Brown and Lind (156) concluded from intelligence and achievement 
tests of retarded and average school children that the relation of achieve- 
ment to mental age depended not so much upon level of intelligence as 
upon the position of that level in the group instructed. Finzel (192) showed 
as the result of tests that the lack of relation between intelligence and 
school achievement was due to personality and developmental factors. 

Heilman and McKee (213) studied the relation of achievement to intel- 
ligence and duration of school training and found that educational achieve- 
ment is determined more by intelligence than length of school training. 
Bobertag (144) compared test scores of school children with their school 
marks. The tests had a higher coefficient of variability than the marks and 
the constancy of test results was better than that of the marks. The validity 
of test scores was .70. Witty and Brink (363) urged the adaptation of 
instruction to maturation levels. Foucault (194) emphasized the impor- 
tance and described methods of measuring mental ability in school children. 

Richards (307, 308) used a series of tests, including the Stanford and 
several auditory, visual, and form-board tests, in studying the abilities 
of first-grade children. He reported the relationship of psychological tests 
to school progress for 326 children. The average progress index was 90. 

West (355) studied achievement resulting from ability grouping in the 
elementary school using over 4,000 pupils in grades three to seven. Results 
were given in terms of grade variability and needed adjustment in separate 
subjects. M. E. Broom (154) found a correlation of .73 to .80 between 
school achievement and mental tests in an elementary school. Engelhart 
(187) ascertained the contribution of mental ability to arithmetic problem 
solving and found that 25 percent of the variation in problem solving 
ability was due to variation in mental ability. Steiner (332) determined 
the annual variation in intelligence of first-grade children. Intelligence 
test scores proved to be a partial indication of success in later grades. | 

High school—Intelligence tests have been used for many purposes in 
the study of high-school problems. M. E. Broom and De Silva (152) con- 
cluded that achievement test batteries were reliable measures of the mental 
ability of junior high-school pupils, assuming that these pupils have had 
equal opportunities to learn. Turney and Fee (346) determined the relative 
value of five group mental tests applied to junior high-school pupils. 
Applying the criteria of discrimination and validity the tests were in ranked 
order: Otis Self-Administering, Terman Group Test, Haggerty Delta 2, 
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National, McCall Multi-Mental Scale. The first three are of about equal 
value. Bowman (147) studied variability in arithmetic problem solving 
by high-school pupils in relation to intelligence. There was less variation 
in performance shown by pupils of higher intelligence. Partial correlation 
between intelligence test results and secondary-school achievement com. 
puted by Collier (164) proved the intelligence tests to be the best predic. 
tive measure of several used, but a combination of three predictive meas. 
ures proved to be most reliable. King (232) compared the value o{ 
nineteen mental and fourteen interest factors in high-school prediction and 
found the mental test the more reliable. Maller (258) determined tha: 
scholastic success in high school can be predicted from age at entrance 
with as much reliability from intelligence measured by standard tests 
The intelligence quotient is slightly superior to raw score for high-schoo! 
sectioning, according to the results of Symonds (335). 

A survey by Mayer (268) of secondary-school pupils in several types 
of German schools proved the pupils enrolled in gymnasia to be most 
gifted. The pupils in the Biirgerschule did well; those of the Realschule 
poorly. Dowd (182) determined that high-school pupils are an intellec-. 
tually selected group, judging from group intelligence test results obtained 
in the sixth grade. For predicting success in algebra, Ayers (140) found 
prognostic algebra tests, a reasoning test, and teachers estimates to be 
superior to the Terman Group Intelligence Test. Torgerson and Aamodt 
(344), however, found an algebraic ability test, an algebra prognosis 
test, and an intelligence test to be about equally valid for predicting 
algebraic success. 

College—College prediction and achievement have been the subject of 
considerable research. Asher (138) found the partial correlation technic 
of little value in analyzing educational problems when applied to intelli- 
gence and English test results. Whether or not knowledge of intelligence 
test scores influences an instructor’s scholastic grades was investigated 
by Constance (168), who found a positive influence in one instructor's 
case. Davis (176) surveyed the intelligence and achievement of 1,4(( 
college students in Kentucky and sought to determine their relationship. 
The scholarship average of each college approximates a general average 
regardless of intelligence average. M. A. Gordon (204) found no signi- 
ficant relationship between the student’s intelligence and the amount of 
school work he was carrying. Henry (214) investigated the relationship 
of aptitude test data to fall quarter grades in ninety-nine cases. Keys and 
Reed (230) found in comparing summer and regular session students on 
the American Council Test that the summer group had a larger proportion 
of superior students, an equal number of inferior students, superiority 
in the teacher and school administrator group, and superiority in the high- 
school compared with the elementary-school teachers. The predictive value 
of the groupings of the Thorndike Intelligence Examination was studied 
by Lefever (243) who found the best predictive measure for success in 
the freshman year to be the total Thorndike score. 





Masters and Upshall (266) found that normal-school students gained 
in scores on repeated Thorndike examinations. Results obteined by 
Messenger (271) indicated that poor students can be eliminated from 
teachers colleges before entrance by the application of a battery of intelli- 
gence and achievement tests. In order of predictive value for success in 
college mathematics, the most effective technics proved to be the American 
Council tests, English, chemistry, mathematics tests, according to the re- 
port of Perry (292). Roucek (313), experimented with intelligence and 
knowledge tests applied to students in the Charles University in Prague. 
Fifty-eight percent of American colleges included in a survey reported a 
gain in average student intelligence test score during the years 1930-33, 
according to Thompson (339). Wagner (349) found test scores on Regents 
examinations more valid than intelligence test scores in predicting the 
scholastic success of college students. Waits (350) reported a high differ- 
ential predictive value for the American Council Psychological Examina- 
tion. Watson (352) used four factors in predicting success of Kentucky 
University freshmen, namely, high-school scholarship, intelligence test 
scores, English achievement, and mathematical test achievement scores. 
In administering mental tests to 1,800 students at the University of London, 
H. D. J. White (357) found that the ranking in departments from highest 
to lowest was as follows: arts, science, medicine, librarianship and laws 
(tied), engineering, journalism, architecture and fine arts. No significant 
difference was found between the scores of men and women. Students with 
low scores but good scholarship concentrate better and work longer hours. 
Those with high scores and poor scholarship records have wider interests, 
more anxieties, are less healthy. Whitmer (360) proved the value of giving 
freshmen probationers’ assistance and guidance through a comparative 
study of groups who had and had not received such assistance. Williamson 
(361) discovered that the intelligence tests were most successful in the pre- 
diction of scholastic success of high aptitude freshmen, whereas, high- 
school percentile rank was more successful in predicting good scholarship 
of low aptitude freshmen. Wolcott (364) found a correlation of .809 be- 
tween initial and subsequent Thorndike Intelligence Test scores three and 
a half years apart. 

Professional—The use of intelligence test scores in scholastic prediction 
for student nurses is reported by several research workers. Bregman (149) 
obtained intelligence test scores for 10,000 student nurses. They ranked 
as a group below college freshmen level, but above the high-school norms. 
More highly selected groups were found in institutions of higher rank. 
Frankford (195), applying a group test to student nurses entering the 
training school, found a correlation approximating .80 between these test 
results and subsequent performance. After two years the number satis- 
factory in performance who scored originally above the 50th percentile 
was much larger than those who scored below. Habbe (209) found that the 
psychological test scores of those student nurses resigning were below those 
succeeding. 
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W. B. Jones and Iffert (226) concluded from results of intelligence tests 
given to 777 prospective student nurses that measures of knowledge and 
ability can be used to advantage in selection. A test of aptitude for nursing 
added little to the psychological test. Potts (300) found that the scores o{ 
student nurses retained beyond the probationary period were substantially 
higher than scores of those not retained. 

Rhinehart (306) attempted to determine the value of a battery of tests 
for predicting success in both academic work and practical accomplish. 
ment. The American Council Psychological Examination proved of greater 
value in predicting grades than the Stanford-Binet. A group of student 
nurses beyond the probationary period proved to be more intelligent than 
a college group of the same age in a study made by Rosenstein (312). 

Triplett (345) found a group of commercial college students to be only 
slightly below college norms for the American Council Psychological 
Examination. Ullman (347) found little predictive value in a variety of 
teacher ability ratings, among them intelligence test scores. R. V. Jordan 
(228) summarized several studies of teacher and student-teacher intelli- 
gence. The data indicated that student teachers as a group rank at the 
median in terms of college entrance test norms. 


Clinical Applications of Intelligence Testing 


Delinquents, prisoners, and criminals—Armstrong (135) investigated 
the parental stock of juvenile delinquents arraigned in a children’s court. 
Twenty-eight percent were of Italian stock, 20 percent Russian, and 12 
percent colored. These percents vary from the proportion of these groups 
in the general population. The delinquent children averaged 77 I.(. 
Beane (142) surveyed 300 delinquent girls and analyzed the results with 
respect to intelligence level, social, economic, and school training fac- 
tors. Intelligence quotients for incarcerated delinquent boys as reported by 
Charles (161) have close agreement when results from several tests are 
compared. Cochran and Steinbach (163) found that the delinquent chil- 
dren do not yield the highest amounts of recidivism. The performance 
of feebleminded and delinquent subjects was determined by Knight (237) 
through the use of performance and verbal intelligence tests. The per- 
formance quotients were higher for all groups than the intelligence 
quotients. 

Lane (241) summarized data for delinquents in the St. Charles School. 
On the Otis Test the median 1.Q. was 88.2. On the Binet in 145 cases it 
was 80.7. McClure (256) found the mean intelligence quotient of 600) 
juvenile delinquents aged from seven to seventeen to be 79. The average 
for the boys was slightly higher than for the girls. Rogers and Austin 
(311) obtained I.Q.’s for over 3,000 juvenile delinquents. A normal dis- 
tribution curve was found with a mean at 82.2. Retests after several years 
showed correlations of .63 to .82. Ruggles (314) analyzed the factors con- 
tributing to juvenile crime in a group of boys sixteen to twenty-two years 
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of age sent to a prison farm. The group of boys studied proved to be much 
below the average in intelligence and mechanical ability. The factor of 
intelligence deficit is given undue importance among other causative fac- 
tors of juvenile delinquency in the opinion of Steinbach (331) as the 
result of investigations in the juvenile courts of Norfolk. R. K. White and 
Fenton (359) discovered, among other facts in comparing groups of bright 
and dull delinquents, that forgery was the only offense associated with 
higher intelligence. 

The median intelligence of unmarried mothers was found to be 76 in a 
study reported by McClure and Goldberg (255) confirming results of an 
earlier study by them. Growdon (207) reported a mental survey of 2,185 
adult female delinquents in the state of Ohio and concluded that for white 
and negro races and two classes of delinquency, the reformatory prisoners 
fall below the mental ratings of their respective races in the general popu- 
lation. Mennens (269) reported successful differentiation of prisoners 
through administering the Healy Completion test. Recidivists and first 
offenders of average and defective intelligence have equal chances of suc- 
cessful subsequent adjustment, according to Shimberg and Israelite (327) 
who studied groups of average and defective offenders. 

Defectives—Ald-ich and Doll (132) compared the development of idiot 
boys nineteen to thirty-eight months old chronologically with normal in- 
fant boys of the same age range on a series of three development scales. 
The idiots were superior on the performance tests of the Stutsman series, 
but inferior on the language tests of the Kuhlmann-Binet Series. The 
same authors (133) studied problem solving in idiots through reaction to 
tools. Individual differences were found among the experimental group. 
Arthur (137) reported the same mental classification for the majority of 
over six hundred feebleminded inmates of a state school after retesting, with 
intervals between tests ranging from one to five years. Doll (181) at the 
request of New Jersey surveyed the status of the feebleminded in the state. 
He reported incidence and educational provisions. 

Durling (184) reviewed the literature on economic status in relation to 
1.Q. and studied the employment records of high-grade mental defectives. 
She concluded that the defective can do work of a routine nature success- 
fully, but only under constant supervision. Fischer (193) used problems 
in abstraction, definition of concepts, grasping relations, and criticism to 
diagnose mild degrees of feeblemindedness in adults. Gordon (205) used 
the Merrill-Palmer Pre-school Scale successfully with low-grade mental 
defectives. The normal preschool child has a better command of language 
than the mental defective of the same age. H. E. Jones (225) differentiated 
the abilities as shown by tests for adult and juvenile defectives. On the 
Binet sub-tests groups equivalent in mental age or opportunity showed 
marked differences. 

Murphy (278) analyzed data for nearly seven thousand cases who 
received mental examinations at the Psychological Clinic of the University 
of Pennsylvania. Twenty-nine percent were diagnosed as feebleminded. In 
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the past five years feebleminded cases incidence has declined due to a 
change in the attitude of the public toward the clinic. An experiment with 
a group test for distinguishing defectives from normals among groups of 
older school children was reported by Otte (286). Parker (289) investi- 
gated the educational proficiency of subnormal children in reading, 
arithmetic, and spelling as compared with standards for normal children. 
Portenier (298) found from studying test records in eight cities that the 
average 1.Q. of the high-school population has declined slightly during 
the past decade and the percent of low mentality pupils who graduate has 
increased. The low mentality high-school pupils are a selected group with 
reference to economic and personality factors. 

Porteus (299) revised the original maze test series and proved its use- 
fulness in studying the behavior of defectives and delinquents as well a: 
for general examining. Shimberg and Reichenberg (328) questioned the 
social adjustment of those feebleminded who never reach an institution. 
They studied 189 cases. Those who adjusted successfully had slightly 
better heredity, were from better homes, had more favorable personality 
traits, were those for whom recommendations made were carried out, and 
were better supervised. From a study of the mental status of children of 
mothers who were inmates in an institution for the feebleminded, Vanuxem 
(348) found that half of the children tested rated higher than the mothers, 
and a considerable proportion were equal in status when compared with 
the mother. 

The gifted—Decroly (179) emphasized as the result of research the need 
for measuring the non-verbal as well as the verbal abilities of gifted chil- 
dren. Burkersrode (157) devised a new series of tests for the selection of 
talented children in their fourth year of school. The tests were used in con- 
junction with an observation record. Gifted children, discovered through 
a battery of intelligence, achievement, and aptitude tests employed in the 
Iowa high-school survey, were further studied through questionnaires sent 
to parents. The group were found to come from superior homes, graduate 
early, and attend college. 

A report by Manrique (263) described the use of intelligence tests in 
selecting gifted children for scholarship grants in Spain. Otto (287) at- 
tempted to select by means of tests the more gifted children who should 
go on to high school. From results of the tests used he drew deductions 
concerning the nature of intelligence. Moore (277) made a cumulative 
four-year study of students graduated from high school before sixteen 
years of age. The students were superior at graduation in intelligence and 
achievement and maintained their superiority throughout college years. 
Scheidemann (319) surveyed an opportunity room for gifted children. 
He found the group to be superior in achievement and intelligence, and 
less well adjusted emotionally than a comparable group of normal chil- 
dren. Sylvester (334) described in detail five cases of gifted children 
demonstrated at the University of Pennsylvania. The Harvard growth study, 
according to Cattell (160) indicated Binet I.Q. increase with age for gifted 
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children. The earlier Stanford study had shown more decrease. The differ- 
ent results are attributable to selection and difference in ages of the group 
tested. Hollingworth (219) found that problems of emotional immaturity 
in children who test above 130 I.Q. observed in high school tend later 
to disappear. 

Other clinical applications—Bronner (150) urged the use of psycho- 
logical tests in clinical practice in studying unevennesses in abilities. 
Camp (158) advocated the interpretation of tests in clinical practice in 
the light of the patient’s developmental school and medical history and 
heredity, and Crane (169) found qualitative examination data as significant 
as total quantitative test results. The same author (170) showed through 
a case history the value of thorough testing for diagnostic purposes. 
Jastak (221) compared responses of normal school children to a variety 
of test items with responses of children with personality difficulties or 
behavior disorders. The vocabulary items were least affected by per- 
sonality disorders. More complex performance test items were more sensi- 
tive to mental instability effects. Wires (362) found the reactions of 
patients in a psychopathic clinic afforded an excellent basis for diagnosis. 


Vocational Guidance 


Brown (155), discussing the use of intelligence tests in vocational guid- 
anee, pointed out difficulties due to overlapping in ability of occupational 
groups. Pond (297) prepared intelligence score distributions for 9,000 
factory men divided into 44 occupational groups. Great overlapping from 
group to group was found, but averages and range in both test score and 
schooling correlate better than .74 with an occupational ranking based 
on estimates of intelligence required for success. Reifenrath (304) found 
no positive correlation between general and technical or commercial in- 
telligence in several thousand subjects in various vocational groups. 
Christiaens (162) discovered that intelligence measures using only verbal 
material were inadequate for vocational guidance. He has found the 
Decroly Box Test helpful in vocational prognosis. Fryer and Sparling (199) 
evaluated intelligence testing for vocational success prediction purposes. 
Farmer (188) found a low correlation between an entrance test and a final 
proficiency test in skilled trades, but the correlations were raised with the 
addition of intelligence test ratings. Thorndike and others (340) in a 
follow-up study of 2,000 children examined at fourteen years of age 


found low correlations between early test data and subsequent vocational 
success. 


Individual Differences in Intelligence and Mental Development 


Anderson and Scheidemann (134) made detailed reports of three sets 
of triplets, including developmental history, mental and personality test 
results. Finch (191) obtained for 1,023 pairs of siblings a correlation of 
49. Outhit (288) comparing the intelligence ratings of siblings with each 
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other and parents with their children found correlations of .42 to .72 
between first-second and first-third sibling. Between mid-parent and mid- 
child, correlations of .77 to .80 were found, and between single parent 
and single child, .40 to .68. H. D. J. White (357) compared the intelligence 
ratings of twenty-six pairs of twins and obtained correlations of .82 using 
the Binet test, and .92 using the Kuhlmann. 

Sex differences have been determined by several investigators. Book 
(145) found among university students that men excelled on maze and 
block counting tests, and the women on number checking and pattern 
recognition. Employing the Goodenough Drawing Test method De Oliveira 
(180) observed that up to the age of nine the girls are superior, after this 
the boys achieve the higher mean score. Conrad, H. E. Jones, and Hsiao 
(167) analyzed army test results for sex differences in groups of children 
and adults in a rural community and found the females superior to the 
males. The superiority varied with sub-tests. 

F. N. Freeman (196), studying individual differences in mental growth. 
found that slower growth children do not necessarily reach the end of their 
mental growth period earlier than do the faster growth children. 

F. S. Freeman (197) summarized findings on the extent of individual 
differences, influence of environment, race and nationality, differences due 
to sex, factors of age, physical development, and personality. 

Marked correspondence in occupational level of parents and their chil- 
dren characterized the data obtained from the cross section of a typical 
school population by Hildreth (217). A. M. Jordan (227) studied the 
influence of parental occupation on test scores for over 1,200 school. chil- 
dren in grades one to seven. Substantial differences in mean test-scores for 
various occupations were found. In the genius group appear only children 
of professional workers. The median score increases with improvement in 
economic level. 

Davidson (175) conducted an experiment in which bright, average, and 
dull children all at the four-year mental level were given practice in 
reading. The bright group proved to be noticeably superior, the dull 
noticeably inferior in reacting to the experimental material. Driscoll (183) 
compared the reliability of Merrill-Palmer and Kuhlmann materials in 
predicting future development of young children. Prognosis was more 
accurate from the preschool composite rating than from the L.Q. alone. 
A statistical summary made by Louden (248) proved the superiority of 
the bright children over the dull children in Stanford-Binet vocabulary 
response. 

In a large adult population Grace (206) found no relation between age 
and mental ability. H. E. Jones and Conrad (223) in studying growth and 
decline of intelligence in a group ranging in age from ten to sixty, found 
a decline by age fifty-five amounting to recession to the fourteen-year level. 
Miles (273) summarized the results of the Stanford Later Maturity Study. 
Declines were universal in all traits tested with advance in age, but decline 
is less rapid in some traits than in others. The same author (272) con- 
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structed ability-at-age curves for ages ten to eighty-nine for sensory acuity 
and mental traits, and found that maximum ability in a wide range of 
functions occurs between eighteen and forty-nine. Decline is not precipitous 
but progressive. Shacter (325) investigated the relation between sustained 
attention in school children and I.Q. Correlations were not of any appre- 
ciable size. 

Harter (210) investigated individual differences in 247 men and women 
subjects with digit-letter substitution tests. Individual differences due to 
practice persisted throughout the learning. Perl (291), using a series of 
mental, vocabulary, and arithmetic tests with fourth-grade school children, 
found increases in individual differences in three out of four tests and 
decreases in one test. Increases occurred in complex processes; decreases 
occurred in the simpler processes. Shultz (329) gave five symbol learning 
tests to college and technical school students a week apart, and found that 
the individuals varied quantitatively and qualitatively in performance ac- 
cording to the normal distribution curve. 


Relation of Intelligence to Other Traits 


Intelligence and physical status—M. E. Broom (153) found a very low 
positive correlation to exist between cranial capacity and scores on intel- 
ligence tests among college men and women. Low positive correlations 
were found to exist between physical and mental age and physical and 
mental quotient in institutionalized feebleminded boys, in a study made 
by Davenport and Minogue (174). Dawson and Conn (177) attempted to 
determine whether any relationship existed between disease condition and 
Binet test results in hospitalized children. Some illnesses are followed by 
more mental deterioration than others. Guilmartin (208) summarized the 
use of psychological tests with deaf children. Subjects of superior intelli- 
gence as rated by standard tests showed greater physical development on 
the average than those in a subnormal group reported by Lucena and 
Barreto (252). 

Maller (260) obtained data for 310 health areas in New York City with 
reference to vital statistics, juvenile delinquency, school progress and 
intelligence. He found a marked degree of intercorrelation in all factors 
studied. Marked national differences were found in each measure. Merry 
and Merry (270) employed with good success the finger test as a supple- 
mentary test of intelligence for blind children. 

Nilson (282) studied the intelligence test results of physically disabled 
children among an unselected school population. Results for the disabled 
children compared very favorably with those for the physically normal 
children. Patrick and Rowles (290) found negligible relationships be- 
tween physiological measures, vital indexes, intelligence, personality rating, 
age, and point-hour ratio among fifty-two university women. A narrow age 
range may have affected the relationships found. Richey (309) paired 
children with and without diseased tonsil or adenoid condition and found 
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only small, unreliable changes due to continued diseased condition or its 
removal. No relationship was discovered to exist between measures of 
intelligence and physical condition by Schell (320) in a comparative study 
of siblings, one of whom had a diseased condition, the other not. Wellman 
(354) summarized studies reporting the relationship between physical 
maturation and mental maturity. No relationship was found to exist be- 
tween some measures of physiological maturity and mental development. 

Intelligence and environmental conditions—Armstrong (136) compared 
groups of urban and rural children with the Otis Intermediate Test and 
two performance tests. The rural group was superior in verbal and ab- 
stract intelligence to the foreign-parentage urban group. Children in either 
group, if of American parentage, of equivalent occupational class, and of 
equal educational opportunity, were similar in the abilities measured. 
Although it is impossible to predict intelligence or scholarship from socio- 
economic status, according to the conclusions of Cuff (172) an association 
was shown in college groups measured with the American Council Psy- 
chological Examination between the test results and rating of socio-economic 
status. Figuerido (190) discovered considerable influence of the social 
environment on test results in administering the Dounaievsky test to 
university level students. 

F. N. Freeman (196) concluded from studies of duplicate twins reared 
apart compared with those reared together that the former differed from 
each other in ability about twice as much as the latter. He found no evi- 
dence that slower children reach their mental growth earlier than faster 
children. Newman (280) reported a series of pairs of identical twins 
reared apart from infancy. The contrast in environment of the various 
pairs differed as did also the scores in mental and educational tests. Hicks 
and Ralph (215) practiced a group of nursery age children in tracing the 
Porteus maze with both preferred and non-preferred hand. The practice 
did not result in significantly greater skill. 

H. E. Jones, Conrad, and Blanchard (222) found, in comparing test 
results of rural and urban children, that the rural environment was a 
handicap. They concluded that a rural child moving to the city would 
increase his intelligence test scores merely as a result of changed environ- 
mental conditions. H. E. Jones (224) summarized studies of birth order 
and intelligence. He concluded that results were conflicting and that there- 
fore the findings were negative. Such studies are liable to error because 
of the many factors that must be controlled. Kawin and Hoefer (229) 
measured with Merrill-Palmer Tests two- and three-year-old children who 
attended nursery school and those who did not. No differences were found 
between children who had and had not attended nursery school. Kirihara 
(233) analyzed the results of intelligence tests for children coming from 
different social levels. He concluded that the relatively poor showing 
of children of the laboring classes was due both to inheritance and milieu. 
Liberman and Elperine (944) concluded from an analysis of test results 
that many tests give an undue advantage to urban subjects. From compar- 
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ing initial and retest 1.Q.’s of orphan-asylum children Lithauer and Kline- 
berg (245) concluded that improvement in environment apparently has a 
favorable influence on the I.Q. Luria (254) studied the verbal reactions of 
children from different environments and concluded that these reactions 
can be evaluated only in terms of environmental opportunity. Maller and 
Zubin (259) studied the effect of rivalry motivation in test achievement. 
Although no increase in score was found, more items were attempted, 
errors increased, and greater variability in score resulted. In a study re- 
ported by Pintner (293), children from non-English speaking homes made 
better scores on primary non-language tests than on primary tests requiring 
comprehension of verbal directions. 

Pintner and Forlano (294) distributed over 17,000 I.Q.’s according to 
birth-month and found the lowest 1.Q. level for each social group to fall 
in the months from January to March. Senour (324) concluded from a 
study of intelligence test results for children from foreign homes that 
verbal tests rate the children lower than non-verbal tests. He recommended 
the use of the non-verbal in preference to the verbal test. 

Sherman and Key (326) measured the intelligence of isolated mountain 
children and concluded that the children develop according to the demands 
of their environment. Intelligence was highest in communities of higher 
social development. Wheeler (356) concluded from the results of testing 
Tennessee mountain children that environmental factors materially affected 
the results. 

Syrkin (336) found superior test scores for children of Soviet officials 
as compared with children of workers. Even when vocabulary differences 
are eliminated the superiority of one group over the other remains. 

Schwesinger (321) summarized all recent material on genetic and en- 
vironmental factors as they affect the development of intelligence. Technics 
for the measurement of intelligence and personality are described. 

Snedden (330), repeating the same and different forms of the same test, 
found greater practice effect in repeating the same form of the test. An 
entirely different test resulted in still less practice effect. 

Intelligence and other traits—Attenborough and Farber (139) investi- 
gated the “G” factor among mechanical ability, intelligence, and manual 
dexterity tests administered to school children. Correct judgment of in- 
telligence from penmanship occurs only by chance, according to B. H. 
Broom and Basinger (151) who compared centile placement of adults in 
penmanship and intelligence. Carroll (159) obtained positive correla- 
tions between intelligence and literary judgment as the result of administer- 
ing a prose appreciation test and intelligence tests to high-school students. 

Dawson (178) concluded that the less intelligent population groups are 
increasing at a rate greater than the more intelligent and might in time 
greatly outnumber them. 

Durrell (185) observed that the results of group tests varied with the 
amount of reading required and concluded that group tests containing a 
large amount of reading material should not be used for the computation 
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of intelligence or accomplishment quotients. Goodenough (203) reported 
that a fairly close relationship existed between drawing progress and 
general intellectual development. Hartge (211) determined the intelligence 
of high-school girls by three methods: intelligence tests, teachers ratings, 
graphology. There was 56 percent agreement among all three methods. 
Hildreth (217) concluded as the result of administering verbal and non- 
verbal tests to young bright children that these subjects were equally com- 
petent in both types of tests. 

Jaggers (220) compared the intelligence of 47 problem children and 
48 well-adjusted children above the fourth grade. On three tests the median 
1.Q. of the well-adjusted groups was 112; for the problem children, 96. 

Koester (238) investigated the interrelations of intellectual aptitudes, 
and talents for music, drawing, and technical skills. He concluded that in 
passing from the superior intellectual groups to the less intelligent groups 
the percent of subjects possessing a very definite special talent decreases. 
and those judged mediocre increase. Loomis and Moran (247) discovered 
that the use of “a,” “an,” “and,” and “the,” in written composition cor- 
related more highly with mental age than any other part of speech. Chil- 
dren studied by Lowry (251) who were given three months’ intensive train- 
ing in reading increased their I.Q.’s on a second intelligence test follow- 
ing the training period, compared with a test given before the training 
period, to the extent of 11.76 points. Matheson (267) compared the intel. 
ligence ratings of preschool children with response to problem-solving 
situations of the Kohler type and found positive correlations with mental 
and chronological age. Monnin (274) computed correlations between in- 
telligence and arithmetic from tests given to school children. Moore (276) 
concluded that the distinction between “linguistic and non-linguistic” is 
not exact. Although the functions are distinct, they may be measured in 
a single test, either linguistic or non-linguistic. Morison (275) reported 
that the speech habit, echolalia, was never observed in any but the mentall) 
defective. Rust (316) studied resistance to tests in young children and 
found negative correlation between resistance and intelligence quotient. 
positive relationship between resistance score and difficulty of the test. 
Taylor (337) found a high positive correlation between visual apprehen- 
sion and mental and chronological age in child subjects who were shown 
a number of toys. Tinker (343) observed that speed of response is asso- 
ciated with mental and scholastic test responses when speed and ability 
were measured on the same kind of material. 

Racial characteristics and intelligence—The intelligence and mental 
development of negro subjects has been reported in several publications. 
A selected bibliography on the physical and mental abilities of the Ameri- 
can negro has been prepared (323). 

Beckham (143) summarized the results of Stanford-Binet tests given to 
1,100 school children and found variations between test results and schoo! 
grade, age, occupation of parents, family size. There was little correspond- 
ence between vocational aspirations and ability. Bousfield (146) measured 
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the ability and achievement of negro school children in Chicago. The sub- 
jects rated normal on the non-language test, below norms on other tests. 

Klineberg (235, 236) concluded that cultural and environmental factors 
have an important bearing on intelligence test responses. Colored children 
were found by Long (246) to be on the average 4.76 points below the aver- 
age white child in intelligence quotient, but this difference he attributed 
to environmental factors. 

Nissen, Machover, and Kinder (283) used performance tests in evaluat- 
ing the mentality of native African negro children five to fourteen years 
of age in the area supplied by the American slave trade. The tested sub- 
jects were inferior to the norms for the tests. Results were more favorable 
in the tests less related to civilized content. Pintner (295) investigated 
intelligence differences between negroes and whites. The intelligence of 
negroes appeared to be lower than that of the whites. Price (302) re- 
viewed the literature on the problem of negro-white differences in intelli- 
gence and concluded that there has been no comprehensive measurement 
of negro intelligence, no valid comparison with whites. 

Stowell (333) in a comparison of white and negro institutionalized 
feebleminded children found the negroes to be younger and mentally 
superior to the whites. 

Thurmond (341) compared the intelligence and achievement of twelve- 
year-old negro children in a rural district of Georgia. The children proved 
to be retarded on the Arthur Performance, Stanford, and Illinois Tests 
to the extent of two or more years. Using the Arthur scores as a criterion, 
they were retarded one and a half years in spelling, three years in hand- 
writing. Using Stanford M.A. as a criterion, the retardation is less serious. 

Mexican children tested by Manuel and Hughes (264) proved to be 
equal to other children, grade for grade, in intelligence and drawing. 
Manuel (265) concluded that the Mexican child ranks below the average 
school child in Texas both in intelligence and school record. The inferior 
test ranking of Spanish-speaking children is not as Sanchez (317) con- 
cluded, peculiar to Spanish-speaking children generally. The same author 
(318) found gains in all abilities tested in a group of Spanish-speaking 
children in New Mexico, and urged consideration of testing conditions in 
making racial comparisons. 

Collmann (165) compared the Otis Test scores of children of Victoria, 
Australia, with the norms and found no significant difference between these 
children and those on whom the tests were standardized. Oliver (284) 
made adaptations of test materials for measuring the mentality of Africans. 
Because of limited educational opportunities in the groups tested, little 
use was made of school knowledge materials. The same author (285) com- 
pared the abilities of English, French, and native subjects in their respective 
colonies. 

Several investigators reported studies of Jewish mentality. Maller (261) 
reported data concerning the intelligence of young Jews. Roback (310) 
reviewed the literature on the measurement of Jewish mentality and at- 
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tempted to explain Jewish superiority. Another summary on the same topic 
was made by Rumyaneck (315). 

In measuring groups of part-Hawaiians with the Binet, Porteus Maze, and 
Healy Picture Completion Tests, Louttit (250) found children of Hawaiian. 
white-Chinese mixtures superior to other groups. Fourteen years appeared 
to be the upper limit of mental development in these groups. Eells (186) 
found the average I.Q. on the Stanford-Binet tests for Eskimo, Aleut, and 
Indian children to be respectively 73.67, 80.27, and 78.88, and on the 
Goodenough Drawing Scale 89.56, 93.29, and 91.55. I.Q. appeared to 
increase with greater admixture of white blood. Garth (201), who em. 
ployed a carefully controlled technic in studying the intelligence and 
achievement of mixed blood Indian children, concluded that mixed-blood 
bears only a slight relationship to intelligence. Telford (338) compared 
Goodenough test performance of Indian, white, and negro children. The 
average I1.Q. of the Indians was 88; of the whites, 100; and the negroes, 
77-79. Indians proved superior to whites on the mare and foal test. No 
significant correlation was found between performance and amount of 
Indian blood. Luh and Wu (253) found Chinese children in Peiping to be 
equivalent to American children in performance test achievement. 

Barke (141) used non-language and verbal mental tests in comparing 
children aged ten to fourteen in three bilingual schools in a Welsh-speaking 
mining district with children in monoglot schools in an English-speaking 
district. Children in the bilingual schools appeared to be slightly superior. 

Reichard (303) found differences in Slav and Jewish immigrants of 
both subjects associated with schooling, sex, and age. No racial difference 
was observed. As the result of non-language tests administered to immi- 
grants, Kolb (239) ranked racial groups in order of superiority as follows: 
Norwegian, English, Swedish, German, Irish, and Italian. The differences 
ranged in terms of test age from 1.5 to 6.5 years depending upon the test. 

Haven (212) obtained a higher intelligence quotient, but a lower 
achievement quotient for native-born as compared with foreign-born chil- 
dren when tested with the Otis Classification Tests (form fourth to eighth 
grade). No significant racial differences were found by Louttit (249), in 
comparing students of white, Japanese, Chinese, and Hawaiian ancestry 
in Hawaii on tests of immediate recall. Daniel (173) criticized existing 
studies of racial differences and proposed a list of sixteen criteria which 
must be met to make racial difference studies valid. Lambeth and Lanier 
(240) studied speed of reaction in groups of negro and white twelve-year- 
old boys and concluded that the more complex the test the higher the rela- 
tive score of the whites. Nassri (279) applied the Pintner-Paterson scale 
to French school children and concluded that national differences in com- 
paring the subjects studied with American subjects can be demonstrated. 





CHAPTER III 
Measures of Aptitude 


To rue AccoMPANIMENT of much healthy self-criticism the development 
of aptitude tests has gone on apace during the past three years. The litera- 
ture for the period contains over two hundred titles on aptitude tests for 
many lines of endeavor—advertising designers, engineering apprentices, 
book revisers, dentists, dressmakers, executives, modistes, policemen, sales- 
men, shoemakers, surveyors, teachers, weavers, and others. The largest 
number is reported in the German journals, followed by those of the United 
States and England. Also represented are France, Italy, Russia, Spain, 
Australia, Argentina, and Japan. 

The line between aptitude and achievement tests is always a thinly drawn 
one. Sections of many of the tests mentioned above are out-and-out trade 
tests. A general characteristic is the absence of published validity coeffi- 
cients. In an aptitude test this lack is far more serious than the general lack 
of published reliability coefficients, shown by Woody (478) to be char- 
acteristic of most published tests. Reliability can be had cheaply, usually, 
by simply lengthening a test. Validity responds but little to lengthening. 

Some articles report impressive validities based upon less than twenty 
cases. Rare indeed is the study which tells of an aptitude test having been 
validated upon some group other than the experimental group employed 
to determine the selection, weighting, or scaling of tests and items. Appar- 
ently most aptitude tests die aborning. 


Common Misconceptions 


A number of common fallacies to be guarded against have been pointed 
out by Crawford (385), Kitson (418), Kingsbury (416), Hoppock (408), 
and others. These are as follows: 


1. One should not take for granted that a test with a certain published reliability 
and validity will have the same reliability and validity when administered to all other 
groups. 

2. One should not assume that the regression of all traits upon success is linear. For 
example, groups of exceedingly conventional or exceedingly conservative artists may 
both succeed excellently, while the intervening group may find its product unsaleable. 
An occupational trait profile constructed from a population of all artists or made up 
only of individuals from the two extremes is meaningless. 

3. It does not follow that the resemblance of an individual’s profile to the occupational 
trait profile of the average man in the occupation or even of the 75th percentile man, or 
better, in the occupation, is an indication of success in the occupation unless the traits 
making up the profile are those which, in combination, are highly valid in predicting 

differential success in that occupation. This is a caution against the hasty adoption of 
the ape-the-successful-man type of psychology. 
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4. There is no indication of validity, as assumed by some, in the fact that a graph 
of the first quartile points and of the third quartile points parallels a graph of the 
median points on a series of tests arranged in a sequential order as in profiles. 

5. Consistent status on two or more highly interrelated (alternate forms of) tests or 
functions (labelled “mechanical ability,” “social intelligence,” etc.) is no evidence in 
itself of validity even if plotted in a profile. Alternate forms of tests or tests highly 
intercorrelated will yield essentially identical normated scores even though the validity 
of the tests in common is approximately zero. 

6. The drafting of a profile does not in itself imply validity. All tests yielding dis. 
tributions, ipso facto, have norms; therefore they may be included in profiles. But of 
what avail to be tenth decile rather than first in a test of zero validity! 

7. It is a fallacy that “anyone can be trained to do anything equally well.” 

8. It is a fallacy that “there is a certain niche in the economic system for which each 
person is designed by nature, if he can only find it.” 


The Inadequacy of First Degree Regression Equations 

Johnson (410) pointed out a fundamental weakness of aptitude testing 
as generally practiced today, namely, that the measures now generally used 
are strictly additive, no provision being made for the possibility that the 
absence of any one trait or characteristic may mean failure in a given field 
of endeavor regardless of the presence of many other favorable character- 
istics. Kelley (415) and Toops (462) described the use of higher degree 
equations, involving multiplicative scores for dealing, among other things, 
with this situation (see Chapter IV). 

Another method for meeting the situation more adequately is through 
prediction from “patterns” of characteristics. Large scale statewide testing 
programs, for example, are bound to secure fairly large groups of people 
who are “alike,” and studies based on such like-patterned groups should 
yield pertinent data as to the potence of different patterns in predicting 
various criteria of success. Projects now under way in which such studies 
are eventuating or may be expected to eventuate are the statewide intelli- 
gence testing and achievement testing programs of Alabama, Colorado, 
Georgia, Indiana, Iowa, Kentucky, Minnesota, Mississippi, North Carolina, 
Ohio, Oklahoma, Wisconsin, and Wyoming; and the extensive occupational! 
testing and employment stabilization projects of Minneapolis (455), 
Rochester (409), and Cincinnati. The study of patterns will be greatly 
facilitated by a recently discovered method of identifying and coding pro- 
files (468). Graphic treatment of patterns has been described recently b) 
Trabue (469, 470), Dvorak (394), and Segel (449, 450). 


Need of a Standard Group for Reporting Validity and Reliability 


Validity and reliability coefficients obtained from different groups simp!) 
are not comparable. The field of vital statistics was in a similar predica- 
ment until the “standard million” concept was brought into use. Death 
rates in Arizona and Massachusetts are not comparable until the effects of 
different age and sex distributions are ironed out. Reports on the reliabili- 
ties and validities of tests will continue to be relatively unintelligible to 
the prospective user until a similar standardization is successful. 
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Need for Interpreting Scores in Relation to the Individual’s Stage 
of Development 


The influence of time in the development of aptitude has largely been 
ignored. All tests yield cross-section views of ability. Yet seldom are a 
number of successive testings assembled into an individual “growth curve,” 
a special case of the profile in which a single test function with successive 
retests is graphed instead of tests of additional functions. The state of scien- 
tific advancement at the present is characterized best perhaps by the state- 
ment that the resources of a Croesus probably could not at the present 
moment locate the growth curves of height of twenty individuals measured 
annually from birth to the age of twenty-one. Courtis (382) stated that 
an excellent I.Q. can be determined for an individual by a consideration 
of the rate at which he erupts his teeth. He (381) also pointed out that a 
single test is uninterpretable unless the conditions of motivation are known 
and are uniform for the group. The work of this research worker is de- 
serving of more attention from scientists, educators, and test-builders than 
it has had. The inauguration of cumulative records alone does not neces- 
sarily insure the accumulation of even the basic data for such investiga- 
tions. The units employed must be equivalent over the range of growth; 
the intervals between measurements preferably should be arithmetical and 
uniform for all; the reliability of the measuring instruments and of the 
measuring methods must be high; and the conditions of motivation of the 
subjects must be controlled and be expressed in definite (verifiable) cate- 


gories. Considerable effort has been made recently to measure periodically, 
interview, and follow up individuals over a period of years. Noteworthy 
researches of this type are Terman’s studies of genius (377) ; the Pennsyl- 
vania study of the Carnegie Foundation (379) ; the annual tests of children 
by Dearborn (401) and the Brush Foundation (461); and the follow-up 
of 2,000 children over ten years by Thorndike (460). 


What Should Be the Criteria of Aptitude Tests? 


Just what criterion an aptitude test should predict has never been settled 
to the satisfaction of discriminating investigators. “Success” is a handy word 
with which a fight may be started in any camp. Is success a matter of 
remuneration and job level or of one’s suitability for the job as judged by 
self and employer? Thorndike (459, 460) has chosen one set of measures; 
the investigators of the London (395) and Birmingham (368) experi- 
ments have chosen another. Macrae (423) has pertinently observed: “The 
truth is that the estimation of actual success is almost as difficult as the 
estimation of potential success.” 

Admittedly, grades are a sorry measure of the success with which an 
individual meets the college situation. Yet these have been used universally 
as criteria for lack of anything better. Grades are a hodge-podge of many 
characteristics of the individual, the instructor, the course or courses taken, 
and the situation. 
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That average grades for different individuals are comparable is a dubious 
assumption in the light of the variation in the difficulty of courses (402, 
476). As Kitson (419) has pointed out, identical averages may represent 
vastly different performances. A given average for one person may repre. 
sent fairly uniform achievement in all subjects taken; for another, the same 
average may represent high marks in some courses and low marks in others, 

Tyler (471, 472) and his associates have been making rapid progress in 
the matter of measuring educational achievement by defining objectives in 
terms of specific observable behavior and developing efficient methods for 
their measurement. The question of the proper combination, if any, of these 
measures still remains a problem. 

Attack on the problem of making ratings more reliable has been made 
successfully by the American Council on Education, as reported by Brad. 
shaw (375), by Stevens and Wonderlie (454), and Richardson and Kuder 
(445). The trend appears toward specifics, using statements and phrases 
such as would naturally come from the tongue of the raters. The last-named 
study applied the Thurstone scaling technic of equal-appearing intervals 
in the development of a list of statements which are marked as being true 
or not true of the individual rated. 


Evaluations of Test Programs 


Evaluations of the results of four aptitude testing programs over a period 
of years have appeared. In one of these, reported by Thorndike, no voca- 
tional counseling took place. In the other three, vocational counseling was 
given to the subjects on the basis of tests, school records, and interviews. 

Thorndike and his associates (460), in an extensive follow-up study of 
children, undertook to find out how well school records up to the age 
of fourteen and test scores at the age of fourteen would predict future 
educational and industrial careers. They concluded from their data that, in 
general, school grades reached, scholarship marks, intelligence test scores, 
or any combination of these, predict later success in school fairly well, but 
that little can be predicted, by means of the tests employed, as to vocational 
success to be achieved by the age of twenty-two. The reservation is made 
that such prediction may be better for a later age when the superior indi- 
viduals are out of school and well along in their careers. Of note is the 
observation that there is much indirect evidence that employers do not fit 
wages to services very accurately in the case of these young workers, and 
that there is direct evidence that they pay substantial premiums for mere 
size in the case of clerical workers. The conclusion is reached that “even if 
the correlations with services rendered should be as low as those with wages 
received (they probably will be much higher), test scores will be much 
better than prejudices and superstitions.” The findings of this inquiry 
have raised a storm of heated criticisms on a number of counts (436, 423), 
chiefly on the point that the experiment cannot be taken as a test of the 
possibilities of vocational guidance since no attempt at guidance was 
included in the research program. 
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A second follow-up study of children who had been given vocational ad- 
vice at the London Institute of Industrial Psychology was reported by 
Macrae (424). The earlier and more extensive follow-up by the London 
Institute was reported in 1931 by Earle and others (395). The later study 
discovered 79 percent of the guidance predictions to be “correct” for the 
group of nearly 200 subjects. Any prediction was classified as “correct” 
if a guidee who succeeded had followed the Institute’s advice and if a 
guidee who failed had rejected it. 

A similar follow-up of individuals who had been given vocational guid- 
ance by the Australian Institute of Industrial Psychology was reported by 
Mirk (427). It was found that 86.6 percent of the subjects had acted in 
accordance with the advice given. Of these, 92.3 percent considered them- 
selves wholly satisfied, 5.1 percent partially satisfied, and 2.6 percent 
wholly dissatisfied with their work. Of the subjects who had not acted in 
accordance with the advice, 8.3 percent reported they were satisfied, 25 
percent partially satisfied, and 66.6 percent dissatisfied with their work. 

A control group of children who received advice based on interviews 
only was used by Allen and Smith (368) and Allen (369) to check the 
results obtained from an experimental group of children who were given 
counseling with the aid of psychological tests upon leaving school. The 
findings indicate more satisfactory results from the use of tests in guidance 
than from guidance without the use of tests (control group). Thorndike 
has suggested that the favorable results obtained are due to the fact that 


employers and children were “influenced” in their ratings of suitability 
by the recommendations and whatever discussion accompanied them. To 
this Macrae (423) retorted that there was an equal opportunity for similar 
discussions with members of the control group. 


Statewide Cooperative Testing Programs 


Statewide testing programs have developed in a number of states as 
reported by Segel (451) and Proffitt (444). Such programs are directed 
toward measuring the achievements and capacities of students in order 
that they may be helped to the most effective development and participa- 
tion in today’s complex society, and are, in a very real sense, directed 
toward evaluating individual aptitudes. 

In addition to directing attention to the importance of individual differ- 
ences and having measures of them for consideration in the choice of 
courses and vocations, these programs are intended, among other things, 
to carry new objectives of education to the schools; to put scholarship on a 
par with athletics as to attractiveness; and to locate promising students 
for purposes of college recruiting. Minnesota and Ohio are seeing in the 
intelligence scores gathered a means of transmuting high-school marks to 
a common basis by the system hereinafter described, and for obtaining for 
prognostic purposes “character quality” inherent in high-school marks. 
A student of low intelligence must be a consistently hard worker in order 
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to make good grades over a four-year period, while an intelligent student 
with consistently low grades may probably be classed as lacking in academic 
interest and drive. 

A minimal guidance program as outlined for the state of Ohio has been 
described by Toops (463, 466). Particularly noteworthy in the Ohio pro- 
gram is the use of a guidance manual, Opportunities in Ohio Colleges, 
which places in the hands of the high-school student information, secured 
by a uniform questionnaire outline, about the specific colleges of the state. 
Included is a device by which the student rates the several colleges which 
he is considering on a number of specifics, the relative importance of which 
was decided by a vote of presidents of Ohio colleges. 

While statewide programs have as yet given little attention to the objec- 
tive evaluation of personality characteristics, such a development is needed. 
The Ohio College Association is at present considering a social participa- 
tion inquiry partially to meet the need. It is proposed to make a survey of 
students in a large number of Ohio colleges in order to determine what 
strengths, deficiencies, aims, and purposes are associated with participation 
or non-participation in extracurriculum activities and to determine the 
pattern of traits that identify those individuals who need (a) personality 
development through extracurriculum activities and (b) curbing of extra- 
curriculum activities in the interest of a well-rounded personality program. 

Noteworthy points in reports of statewide testing programs are as follows: 

In Wisconsin, the issue of freeing the public schools from the pattern 
of prescribed entrance units has been raised since the tests are so much 
the better pronosticators of college success than the patterns of subjects 
studied (405, 407). 

It has been pointed out in Ohio that the colleges of the state could be 
recruited twice over from students above the 40th percentile in the distribu- 
tion of intelligence scores of Ohio College Association freshmen, and thus 
the need for thousands of scholarships is squarely raised (465). 

The Pennsylvania study points out that forgetting is an important phe- 
nomenon of college learning, seniors often knowing less than freshmen in a 
given topic, and that the products of colleges are no more uniform than the 
products of the secondary schools, thus implying the need for transmuting 
the marks of the colleges as well as of the high schools (379). 

In Minnesota, a simple transmutation of averaged high-school marks has 
been shown to be of as much value as intelligence tests for predicting col- 
lege success, thus suggesting this as a function which should be performed 
annually in all states (411). 

All in all, the statewide testing movement bids fair to lead to a new kind 
of educational statesmanship in which the total talents and attainments of 
state populations shall be as carefully considered by a state’s guidance 
director as are those of the citizenry of nations in time of war. When such 
shall have been done, it is inevitable that consideration be given to the 
contention of Clark (380) to the effect that the present economic dilemma, 
and its increasing complexity whenever the growth of population shall have 





stopped, can be solved only by a national occupational planning board. 
This board would have as one of its chief jobs the creation of thousands 
of new occupations—mainly service occupations—and which will sub- 
sidize preparation for and entry thereupon. 


Scholastic Aptitude 


Wagner (475) made a general survey of the literature on college 
scholastic prediction. Some of the variables which have been studied are 
discussed below. 

High-school grades—Investigators are generally agreed that better pre- 
diction of college achievement, theoretically, may be obtained from high- 
school marks than from intelligence tests, and, in fact, several studies have 
obtained the highest correlation from them. Among these studies are those 
of Crawford (384), Douglass (392), Edds and McCall (396), and 
Schleier and Schreiber (447). This is exploiting the reliability of marks at 
different ages, and so, presumably, as the reliability of marks in both 
schools is improved, the limit will approach the corrected-for-attenua- 
tion coefficient, whose limit presumably is 1.00 when the curriculums of 
the two schools are identical but couched at different levels of maturity. 

Such use of high-school marks in a given college, where the students 
come from many high schools, generally implies the transmutation of marks 
to a common basis. Toops (467) has pointed out that “the two most im- 
portant reasons why marks are non-comparable, we may guess, are that 
(a) the distributions of marks are not identical and (b) the intelligence 
distributions of the pupils may not be equal, even if the former condition 
obtains.” He has therefore proposed the principle that the average marks 
of all pupils of a given narrow range of intelligence, in whatever high 
school, shall be transmuted to a common transmuted score. “If one of these 
schools, say A, be designated as a standard school (it may be a hypothetical 
school) the marks of all other schools may be transmuted to the basis of 
the standard school. Thus all marks of all pupils in all schools, having 
been made comparable with the marks in school A, will be comparable 
with one another. And, accordingly, it follows, for example, that a college 
may be built up (compounded) of the past populations of many schools 
with known (equated) secondary-school success; and these secondary- 
school transmuted marks confidently may be expected to prognosticate 
college success very much better than will the untransmuted marks, or the 
transmuted marks arrived at by any system which ignores the intellectual 
differences of different schools. 

“Eventually the question would arise: ‘Should the school be rated on 
all subjects, or should the “non-intellectual” ones, like manual training, 
shop, art, and domestic science, be excluded from the averages?’ It will be 
@ progressive move when the pupil’s transmuted secondary record which 
goes collegeward is not a ‘rank of 5 in a class of 63, nor even a single | 
transmuted score as described, but instead several (italics ours) transmuted 
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scores, saying, in effect, ‘On a comparable standard, this pupil is good in 
science, mediocre in mathematics, and excellent in English.’ ” Evaluation 
of patterns of high-school credits and transmuted marks by the process 
already described would then be in order. Probable success in specific 
curriculums should be the aim of prediction for the purpose of giving 
more specific educational guidance than is now feasible or wise, for it 
is in this realm as observed by Link (420), that the most promise for 
guidance lies. 

Intelligence tests—Many correlations of intelligence tests with college 
scholarship are reported in the literature. In recent years, most of the 
correlations in representative studies range from .50 to .65 (378, 387, 434, 
464, 474). 

Colleges and universities today are being called on much more fre. 
quently by business, industry, and school for the intelligence records of 
university students. If high-school and university scholarship were trans- 
muted to a comparable basis as described above, employers would find 
these of as much interest as the intelligence tests. One reason for this 
present interest in the latter is that the use of percentiles in interpreting 
scores makes them quickly understood by the layman. Another reason is 
the predominant use of a certain half dozen outstanding intelligence tests. 

The early fears of some that the centralized control of test programs 
would lead to lack of progress would seem to be realized if we judge by the 
increase in the validity coefficients alone. But the situation appears better 
on second inspection, After all, the making of better tests is a technical 
matter, necessitating funds, machinery, and time for the careful item analysis 
which is required. There is some likelihood that the lack of improvement 
in validities during the depression is a function of the upward swing in 
average intelligence of entering college students (458). Since the range of 
intelligence has decreased, it has required better tests to maintain the old 
validities. 

Another explanation for the rising average intelligence score, less 
plausible to the writers, is that higher scores are being obtained because 
presentday students have been “test-broken” through their previous school 
experiences with objective tests. Many contend that the college students 
of today are much more serious-minded than those of a half dozen years 
ago. An interesting corollary of higher selection as to intellect and purpose, 
is that, with no change in scholastic standards, failing (i.e. D and E) 
marks should seldom if ever be given. 

A notable application of intelligence tests at the university level is that 
of standardizing the marking system of individual college professors as 
described by Ogan (433). The individual intelligence scores and marks of 
his students in other courses are used to help the instructor to determine 
whether or not his grading standard is relatively low or high. Whinery 
(476) has also used intelligence scores as the basis for determining the 
relative difficulty of courses and for making comparable the marks from 


course to course. 
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Achievement tests—Validities comparable to those obtained from intelli- 
gence tests have been reported from various subjectmatter tests such as 
the Iowa Content Examination (473), the Kentucky Classification Test 
(434), and the College Entrance Examination Board’s test (376). 

Study time—Although May (425) found, several years ago, a distinct 
relationship between degree of application as measured by reported study 
time and scholarship, later studies have failed to confirm this finding. Hart- 
son (402), for instance, found a correlation of —.15 between estimated 
hours of study and grades. L. Jones and Ruch (412) reported one of —.28. 
In both studies the improvement in prediction is negligible when this meas- 
ure is combined with intelligence scores. One reason for this lack of rela- 
tionship is that students do not estimate their study time correctly. As 
revealed by Stokes and Lehman (456), the poorer students tend to over- 
estimate their time more than do the better students. The less able students, 
on the whole, compensate somewhat by studying more and, possibly, by 
electing a larger proportion of easier subjects. 

Personal data—With few exceptions, personal data have proved to be 
of little use in prognosticating college achievement (372, 457). Age at 
entering college, however, has generally been found to have a significant 
negative correlation with college grades (384). Age, however, is simply 
crudely reflected intellect. 

Personality characteristics—A field of investigation in the college area 
which is demanding attention is that of the more adequate measurement of 
attitudes, interests, and motives. A few scattered studies have been made for 
the purpose of predicting college grades. Hartson (403) found that a blank 
for recording student interests had validities of .365 and .231 for men and 
women respectively. Holcomb and Laslett (406) found a validity of .322 
in predicting the grades of freshman engineers from engineering scores on 
the Strong Interest Blank. Segel and Brintle (452) reported a marked 
relationship existing between vocational interest scores on the Strong In- 
terest Blank and differences between marks in college subject groups, as 
well as differences between achievement test results on the lowa High School 
Content Examination. Scores on such blanks have been found to be easily 
faked, making them of value only in case the answers are bona fide. In- 
vestigators have found that interest and attitude scales validated on one 
group frequently lose most of that validity when given to other groups 
(393). The American Council on Education record system calls for a care- 
ful cumulative record of specific interests. However, such records, desirable 
as far as they go, do not readily admit of any but subjective interpretation, 
although the trail has been blazed by Bradshaw (374), who showed several 
years ago that “behavior specifics” are of far more value in character 
measurement than the scores derived from any character rating blank or 
scheme. 

Combinations of various measures—The multiple correlation coefficients 
based upon the best combination of a number of prognostic measures were 
reported by Wagner (473) to be .67 for the investigations reviewed. Craw- 








ford (384) reported validity ranging from .68 to .74 based on a combina. 
tion of transmuted high-school scholarship, College Entrance Examination 
Board averages, scholastic aptitude test scores, and age at entrance. Hart- 
son (403) obtained a validity of .68 from the use of five tests. Byrns (378), 
Edds and McCall (396), and Douglass (392) reported multiple correla- 
tions of .63, .81, and .64, respectively. These are encouraging when viewed 
in the light of the .35 and .40 which typified the validity of Army Alpha 
some fifteen years ago. 


Scholastic Aptitude in Specific Fields 


The reader is referred to Wagner’s review (475) for studies on the 
prediction of achievement in specific subjects. Wagner’s own work (474) 
along this line is notable. 

Prediction in various curriculums is taken up below: 

Dentistry—Schultz (448) investigated the relationship of performance 
on the Miles Two-Story Duplicate Maze and on the American Council 
Psychological Examination to the success of 90 second-year dental students 
in their courses as measured by eight different criteria. A score of maze 
adaptability produced the highest validities, the correlations being from 
.247 to .372 with the various criteria. Keller and Weber (414) reported 
on a battery of tests which all persons seeking to prepare for the dental 
profession in Germany must take. 

Engineering—Of the tests which Holcomb and Laslett (406) tried out, 
the highest validity in predicting grades of first-year engineering students 
was found to be that of the American Council Psychological Examination 
with a correlation of .555. Other correlations were as follows: McQuarrie 
Mechanical Aptitude Test, .478; the A-S Reaction Study, —.41; Stenquist 
Mechanical Aptitude Test No. 2, .428; and engineering scores on the Strong 
Interest Blank, .322. No combination of the scores of the different tests 
was attempted. Grades in freshman mathematics and freshman mechanical 
drawing courses were found by Wilson and Hodges (477) to correlate .42 
and .406, respectively, with grades in advanced engineering courses, while 
the validity of the Otis Advanced Intelligence Scale was .382. A multiple 
correlation of .69 with grades in advanced courses was obtained for four 
variables: grades in three freshman courses and the Otis test. 

Law—The Ferson-Stoddard Law Aptitude Examination was found by 
Gaudet and Marryott (400) to be better than freshman grades in predicting 
grades in law school. 

_ Medicine—Moss’s reports (430, 431) on the use of his scholastic aptitude 
test for medical students, based upon a four-year study of 1,000 students 
and a three-year study of 5,000 students, concluded that “the aptitude test 
scores give a somewhat better prediction of what the student can do in 
medical school than any other single one.” Cowdery and Ewell’s study 
(383) of 45 first-year medical students confirmed this assertion. The Moss 
test correlated .636 with grades in medical school. When the test and pre- 
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medical grades were combined, the correlation was .764 with medical school 
rades. 

¥ Music—In a careful study by More (429), fifteen group tests of musical 
ability (ten being standard tests and five being new ones devised by the 
investigator) and an intelligence test were given to all entering freshmen 
of the School of Music of the North Carolina College for Women for the 
years 1927-30. A multiple correlation of .73 was obtained between college 
freshman marks in music courses and the four tests of the finally selected 
(arbitrary) battery. Stanton (453) found that a battery of tests of musical 
ability given to freshmen predicted successful graduation. 

Nursing—Rosenstein (446) and Frankford (399) both found a rela- 
tionship between intelligence test scores and trainability of student nurses. 
The former reported that 75 percent of those individuals dropped from 
the training school for scholastic reasons were below the 50th percentile 
on the American Council Psychological Examination. However, when he 
compared high and low groups selected by this test, he found practically 
no difference between the two groups on efficiency ratings in practical work. 
An extensive study by W. B. Jones and Iffert (413) of 777 students revealed 
a nursing aptitude test to be most highly related to their efficiency record, 
with the psychological examination a close second. 

Teaching—The scores of 135 senior and graduate students on the Coxe- 
Orleans Prognosis Test of Teaching Ability were compared by Dodd (391) 
with scholarship and with supervisors’ ratings of teaching ability at the 
close of the practice-teaching period. Correlations of .425 with supervisors’ 
ratings and .636 with scholarship were obtained when raw scores rather 
than weighted scores were used. Part II of the test—a true-false test of 
professional interest—had a larger correlation (.454) than the entire test 
with supervisors’ ratings. 

As a general criticism of studies on aptitude it may be said that most of 
them make two probably rather invalid assumptions: 


1, The aptitudes required for learning an occupation are the ones required for its 
successful plying later. 


2. The judgments (mainly or solely) of supervisors and overseers are inadequate 
measures of the success which the tests were designed to anticipate. Regarding the 
latter point, Bowman (373) has shown that probably all of the teaching aptitude tests 
thus far devised, numbering many dissertations, are largely useless because no group 
of teachers in such an investigation has ever been rated as to true “teaching ability.” 


Mechanical aptitude—This subject has received relatively little attention 
since the researches of the Minnesota group and Cox. 

Clerical aptitude—O’Rourke (435) conducted an extensive research on 
the development of tests for the selection of stenographers and typists in 
cooperation with a number of industrial firms. Correlations of .71 and .76 
with efficiency ratings were reported for employees in two different com- 
panies. The National Institute of Industrial Psychology (428) used its 
clerical test, consisting of seven parts, in the United States. Correlations 
of .87 and .94 are reported for two small groups of 18 and 28 respectively! 
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The Minnesota Clerical Test, a test of 800 items involving name-checking 
and number-checking has been developed by Andrew, as reported by Pond 
(441). It correlated .65 with ratings based upon personal history items, 
and .37 with supervisors’ ratings of 138 clerical workers. The retest relia- 
bility was found to be .86. That the author was successful in getting a 
measure relatively independent of intelligence is shown by the low correla- 
tion of .25 with intelligence. Further experimentation has demonstrated 
this test to be the most sensitive of a number of clerical tests in distinguish- 
ing the various grades and classes of clerical workers and in differentiating 
between employed and unemployed clerical workers (370). A correlation 
of .65 between ratings for 54 clerical workers in a mail order house and 
scores on a test for ability to add was reported by Kinney (417). Pond 
and Bills (440) have found a significant relationship between intelligence 
test scores and grade of clerical work engaged in. 

Miscellaneous—Diehl and his associates (390) found that of the many 
variables investigated, only name-checking, number-checking, and physical 
defects were related to captains’ ratings of policemen. 

Lovett and Richardson (422) reported on the significance of various 
types of test material in distinguishing sales ability and sales managerial 
ability on the basis of an item analysis of a test battery of about 900 items. 

The problem of eliminating incapable automobile drivers has been re- 
ceiving a good deal of attention all over the world, but most articles on the 
subject are discussions of the need for some method of spotting the danger- 
ous drivers, or of the kind of tests which might be useful. A few experimental 
studies are reported in the literature. Miles and Vincent (426) found a 
correlation of .77 between driving efficiency and the test for motor drivers 
conducted by the National Institute of Industrial Psychology. Forbes (398) 
reported the use of an apparatus which reproduced in miniature in the 
laboratory the conditions of driving through traffic. Thirty-one commercial 
drivers and fifty university students were tested. The test had some dis- 
crimination in picking out the accident-prone commercial drivers from the 
others. 


Bingham (371) reported a study of twenty-four variables in relation to 
accident-proneness of street-car motormen and bus operators of the Boston 
Elevated Railway. Accident-proneness was found to be associated with 
lack of aptitude, personality defect or uncooperative attitude, and health 
defects. Application of the resulting selection method effected a decrease 
of 43 percent in collision accidents over a period of four years. 

Accident-proneness may be considered as the tendency to have accidents. 
In this field, Henig (404) found that the more intelligent boys in a trade 
school had fewer accidents. Farmer and his associates (397), however, 
did not come to the same conclusion in their study of dockyard and R. A. F. 
apprentices. They did, however, find some relationship between accident 
rate and performance on their tests of sensory-motor coordination. 

Studies and discussions on aptitude for the occupation of aviation pilot 
are reported from France, Italy, Spain, Russia, Argentina, and the United 
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States, but few give objective results. DeFoney (388, 389) reported a 
study of 628 individuals who were given a psychological examination 
previous to training as pilots. The low group, given training in spite of 
their low ratings on the examination, had a “crash rate” twice as great as 
that of the men who had high ratings. 


Test Administration 


Effect of long test periods—The question of whether or not long test 
periods result in a loss of efficiency led to a study by Noll (432) of students 
taking a three-hour battery of tests. Performance on the Peterson Uni- 
form Equation Completion Test was taken as the measure of efficiency 
before and after the main test period. The students were found to be slightly 
more efficient after the testing than before. 

Time-limit versus work-limit—Workman (479) and Porter (442) found 
that administering the Ohio State Psychological Examination by the work- 
limit method produced higher validities than the time-limit method in the 
prediction of college scholarship. 

Efficiency test forms and scoring methods—J. C. Peterson and others 
(437, 438, 439) have been prolific in the matter of producing efficiency 
methods of test scoring. Their methods call for answer sheets separate from 
the test proper. One method of scoring consists of punching holes through 
batches of tests in the correct answer position; scoring is then a matter 
of counting correct answers as indicated by the perforations. Another type 
of answer sheet is treated so that heating the test brings out a sympathetic 
ink which has been printed on the correct answer positions. Still another, 
useful for self-scoring test use, is made so that when the subject dabs the 
intended answer position with a piece of moistened felt it turns blue if 
the answer is correct and red if it is wrong. The characteristic of an imme- 
diate check upon the student’s answer makes this kind of test a promising 
one for use in the learning situation. 

The Clapp-Young Answer Booklets are sealed answer folders with carbon 
applied inside so that when the subject answers a question correctly an 
“X” is registered in a square. Scoring then becomes a matter of opening 
the blank and counting. 

A scoring machine which quickly scores each test, prints the score on 
the test, indicates the wrong answers, and makes a record of the number 
of times each item is missed, has been invented by Pressey (443) and Little 
(421). The subjects use hand-punches in indicating their answers on the 
answer card. In the classroom situation, students’ tests are scored as soon 
as they are completed. When everyone is through, the instructor is able, 
by looking at the item record, to discover the items on which the most 
errors were made and to discuss them immediately. 

“Krexit” is the trade name of a machine which is used to print red circles 
in the correct answer position on the answer cards after test has been taken. 
This is similar to a method used by Toops for printing-press scoring which 
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blocks out all the incorrect answers. Scoring is reduced to the operation o/ 
counting the number of right answers. 

Cuff (386) developed a device for scoring answer pads in which the 
subjects have indicated their answers by punching holes in the answer 
sheets with pencils. Small metal cylinders are dropped through the holes 
and weighed, the author reporting that speed and accuracy of scoring by 
this method are much greater than when scorers count the right answers. 

A test booklet which is particularly well adapted to large-scale test pro- 
grams has been developed by Toops (464). The Ohio State University 
Psychological Examination Form 18 has been so devised that the answer 
pads may be removed and attached at will, allowing the test proper to be 
used many times, and bringing the per-pupil cost of a testing program to 
a very low figure. The test subjects use pointed styli for piercing the test 
pad to indicate the intended answers. The test pads are then taken apart, 
yielding three identical pre-scored copies for each subject. These copies 
serve as records of the students in offices of the principal, superintendent, 
and central research bureau. The tests may be easily scored by the pupils 
themselves since the correct answers are indicated inside the answer pad, 
and since each copy of a given pupil’s three answer sheets bears no means 
of identification other than a common printed individual serial number. 
Accordingly, when a given pupil’s three sets of answers are reassembled 
the teacher may easily check all three records, which, agreeing, render 
unnecessary additional official scoring, and supply three correct scored 
copies for as many different offices. 


Summary 


Several heartening signs of vitality are apparent in the aptitude testing 
field. One of these is the growth of cooperative statewide testing programs 
directed toward measuring the achievements and capacities of young people 
in order that they may be aided to the most effective individual develop- 
ment. The follow-up and evaluation of the results of experimentally- 
controlled testing and guidance programs are equally significant of the 
times. Another is the effort toward obtaining more adequate criteria of oc- 
cupational success through defining more specifically the objectives of 
education, and through the development of more valid measures of in- 
dustrial success. Still another is the marked progress being made in the 
matter of more efficient test forms and administration. 

A number of characteristics of aptitude testing when weighed in the 
scales are found wanting. Developments in the field to be particularly de- 
sired are the use of more adequate methods for combining tests and test 
items, the use of standard groups for reporting validity and reliability in 
order that published coefficients shall be comparable, and the interpreta- 
tion of test scores in terms of the individual’s position on his own growth 
curve. The literature bears evidence to the fact that the problems are coming 
to be appreciated and that we may hope for their eventual solution. 
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CHAPTER IV 


Test Construction and Statistical Interpretation 


Recent pusuications show a growing interest in and dependence upon 
statistical method. Educational research cannot go full steam ahead until 
the main body of the profession has acquired at the very least the mathe- 


matical and statistical competence necessary to understand the appearing 
contributions. 


Teaching of Statistics 


Rather than further indulging in the popular pastime of disparaging 
the amount of mathematics necessary for educational research, attempts 
are now being made, notably by Brown (492), to determine the precise 
mathematical difficulties under which students of education taking statis- 
tics labor; and to escape these difficulties by a “psychological” text, em- 
bracing a series of self-scored diagnostic tests intended to determine and 
guide the student’s specific needs for review and covering the essential 
mathematical topics (605). Walker and Durost (606) also sensed the need 
for three-dimensional models for the teacher’s aid in making clear the 
concepts of statistics. Camp (495), a mathematician, has produced an ex- 
cellent text on the essentials of mathematics for the study of statistics 
which does not presuppose a mathematics major preparatory to its study. 


Estimates 


The estimation of results is a well-recognized part of a good statistical 
training in order that one may not accept erroneous and ridiculous results, 
the product of computational error. That estimation can be reduced to a 
science was shown by Walker and Sanford (604) ; and that in certain cases 
estimation may replace actual computation was shown by Scates and 
Noffsinger (570). It would be quite worth the while to have the topic 
developed carefully in all its branches. 


Patterns of Traits 


Perhaps the most significant recent trend in statistics, in the reviewers’ 
opinion, is the growing appreciation that all the factors entering into the 
educational business—human traits, educational environments, educational 
results, and educational costs—are not separate and isolated variables but 
more accurately may be represented by profiles. We must speak of patterns 
of ability (see Chapter III); environmental patterns; educational-result 
profiles; and patterns of student costs. The clinical analog is “types.” But 
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“types” have several difficulties, not the least of which is the “word-handle” 
difficulty. Each type had to have a name—for the requirements of language 
seem to demand such—and either over-simplified human personality if 
only a few patterns were named; or, venturing in the other direction, were 
laughed out of court because of the multiplicity of names originated and 
the inability to specify, precisely, of what traits a given type was com- 
pounded. The statistical difficulties have been largely solved by Toops 
(595) who proposed a system of addends, mathematically derived, 
whereby each one of all possible patterns of profiles may be coded (and 
readily decoded) by a single unique code number. Accordingly, whenever 
the traits of personality shall have been analyzed into the half dozen, 
more or less, unique traits of which personality is compounded, we shall 
be able to talk of a person whose profile number is 318. Such a person 
differs from a person of profile number 317, say, just one degree (say 
tertile) on one of the unique tests. In wholesale statewide or nationwide 
testing many persons of each such profile number will be located, and 
these, later occurring in many “natural experiments”—going to college, 
getting married, becoming college professors—will yield each their several 
“probabilities” of “success.” Accordingly, it may be safe to prophesy that 
we may expect shortly to see more of this “single variable” type of 
experimentation. 

The psychologists and test-makers are not alone in their interest in 
patterns of human ability. The sociologists, interested in the traits and 
the resulting behaviors of societies, are beginning to see that “like-consti. 
tuted” groups may be expected to have “like-behaviors.” Burt (494) 
devised a machine, essentially a counting Findex, called the selecto-meter, 
to locate, in a peg-represented list of persons, these like-constituted 
individuals. Beyle (488), a political scientist, has devised technics for 
sorting out the “like-minded” individuals among a group of congressmen, 
so that their future voting may be anticipated on the basis of their “attitude 
types.” There is more than a faint suggestion that congressional speeches 
might be dispensed with, not to mention the actual mechanics of voting, 
if we could knew, in all their statistical ramifications, just whom we had 
voted into office last November! 

Brolyer (491) raised a fundamental question when he asked, and solved, 
the query, “How reliable is a profile?” 

The selection among profiles of human ability which the occupations, 
progressively with time, effect, has been attacked by B. J. Dvorak (513). 
This publication is another evidence of the growing awareness among 
research workers that the composition of groups in terms of profiles, and 
of greater groups in terms of their component lesser societies, each with 
its component traits and patterns of individuals, is a profitable field for 
exploration. Thus far the societies to which a man belongs have been lett 
out of our regression equations, which, moreover, are representations by 
man’s ability rather than a man’s. 





Statistics of Crowds and Societies 


The statistics of groups, crowds, and societies, in the light of the traits 
of the individuals of which such are composed, are in the main yet to be 
developed, although the basic formulas have been available for some time 


and occasionally are rediscovered or given a new twist, as in a report of 
Horst (535). 


Equational Representation of Aptitude 


The sensed inadequacy of the basic technics have given rise not only 
to such developments as the above, but have given rise to doubts as to the 
adequacy of the simple regression equation concept, which long has been 
under fire, particularly on the part of those more influenced by poetic, 
philosophic, or clinical case history zeal and considerations than by the 
rigorous demands of statistical logic. Such criticisms eventuate in little 
that may be put to a statistical test until such a concept can be couched in 
the form of an equation. Kelley (541) and Toops (592) have advocated 
regression equations of the multiplicative order, i.e., involving products 
of test scores in place of their weighted additive sums as in the ordinary 
regression equation. These, when applied to the problem of predicting 
college success to date have not worked a revolution in the validity co- 
efficients (528, 565). Such are at least one answer to the criticism of John- 
son (540) who deplored the use of additive regression equations only. Such 
an equation may take at least some account of such observations as these: 


1. An aptitude may be zero even if all but one of its “determiners” are present and 
favorable, if that one is of zero amount. 


2. Concretely, a student’s scholarship may be zero, no matter how bright he is or 
how hard he studies, if the text is so difficult that he cannot at all comprehend it. 


Still more complicated regression equations, readily fitted by least 
squares methods, have been advocated by Toops (588). Curves involving 
second powers, for example, have the property of rising to a maximum 
and then descending, a phenomenon, associated particularly with time, as 
for example, the well-known phenomenon of the growth of height. Such 
equations, of course, are too absurdly simple to be maximally useful, and 
probably we shall make little progress until we begin to employ real 
growth equations of such a degree of complexity that they stand a fair 
chance of being a good approximation to the “true” equation. The reviewer 
hazards the guess that Will’s equation (613) is of this minimal degree of 
complexity, since it has the property of rising to a maximum and then 
declining to a fixed asymptotic lower limit, a phenomenon often noticed 
in population growth, a property of crowds analogous with the phenomena 
of inertia and momentum of physical objects in respect to motion. These 
physical notions of a society are being developed by Dodd, a psychologist — 
recently turned sociologist. 


231 








Factor Theories 


The factor technics are attempts to derive an analysis of man’s capacities 
and abilities out of the internal consistency of their interrelationships as 
revealed in the intercorrelations of test scores. They may be humorously 
cataloged as an attempt to determine an equation without any equality 
sign and any left-hand member. These have been the rage of the past few 
years. The fundamental and earlier contributions of Spearman, Kelley, 
and Holzinger have been followed with later and fundamental contribu- 
tions by Thurstone (581, 584, 585, 586); Hotelling (538); Spearman 
(574) ; and lately, Wilson (614, 615). The appearance of a new title by 
any of these is acclaimed with joy by a very small camp of enthusiasts. 

The problem of factors is essentially the problem of the above-named 
analytical technics of patterns, but done by a different attack. So far, it 
must be acknowledged, the methods, the claims, and the speculations of 
rival camps, with the exception particularly of the main representatives 
of the camps above noted, have resulted in little other than excuses for 
magazine articles. Certain notable exceptions are Thurstone’s analysis 
(582) of the fundamental interests of man; Moore’s analysis (549) of 
the fundamental syndromes of the insanities; Reinhart’s (561) and Asher’s 
(484) attempts to derive an actual intelligence test by the use of factor 
technics; Anastasi’s analysis (483) of memory; and Dunlap’s analysis 
(510) of the learning of chickens. 

If factor theories cease to be a religion and become a statistical technic 
they promise much assistance in the analysis of causation of human be- 
havior; in important applications in test-building and aptitudes; and aid 
in the direction generally of learning, motivation, and social ameliorization 
so far as these can be effected through control of the human variables. 


Analysis of Variance and Covariance 


A different angle is the analysis of variance and covariance, a method 
favored by Kelley (541) and Snedecor (573). Causality has been treated 
abstractly by Dunlap and Cureton (509) and with respect to mental life 
particularly by Moore (550). 


Wholesale Computation of Intercorrelations 


The factor theories and analysis of causation require, generally speak- 
ing, all the possible intercorrelations, means, and standard deviations of 
n variables, where n may be quite large. Accordingly, we see much interest 
in tabulating machine methods of obtaining the host of moments neces- 
sitated. The first monograph (608) of the Columbia University Statistics 
Bureau has become a classic. It is fair to predict that the recent text (485) 
on the application of Hollerith machines is but a forerunner of a new day 
in statistical computation, and a sign that eventually a host of basic refer- 
ence works will follow. Dunn (511) has devised a method of punching as 
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many as 80 variables, each of 4,095 inclusive range, into a single Hollerith 
card, or a corresponding number of other punch-card equivalents, e.g., 
240 variables of 15 scores each. Royer and Toops (568) have shown that 
these 240 variables may consist of 15 coded scores, thus fixing the limits 
of presentday statistical machines at 240 variables, each of unlimited 
range. They have also developed the mathematics of the method. Before 
inclusive growth studies of all the varied functions of an individual may 
become feasible we shall have to render unlimited not only the number 
of persons and range of scores, but also the number of variables per person. 


Printed Correlation-Solving Forms 


Three new printed correlation forms have appeared, prepared by Cureton 
and Dunlap (503), Smeltzer (572), and Tryon (600); five variations in 
methods of correlation-solving, by Chapin (496), Constance (498), A. 
Dvorak (512), Feldstein (516), and Stephenson (576); a method for 
solving the standard deviation by stencil by Toops (589); a method for 
solving mean square contingency by Royer (567); a stencil-method of 
fitting straightline trends, regression of Y on X with one plot per X-unit, 
by Toops (591); several methods for simplifying the computations of 
partial correlations, multiple correlations, and multiple regression equa- 
tions, by Bakst (486), Franzen and Derryberry (518), Peters and Wykes 
(559, 560), Thomson (579), Griffin (525), Horst (533, 537), Wherry 
(610), Brolyer (490), and Adkins and Toops (481). A new least-square 
curve fitting machine, for fitting straightline trends, or logarithmic equa- 
tions in the linear form, has been devised by Gaines and Palfrey (519) ; 
and their use of the correlation coefficient to compare the efficiency of their 
machine with the ordinary algebraic methods has been criticized by Berk- 
son (487). A machine for the solution of determinants up to the tenth 
order has been invented at the Massachusetts Institute of Technology, an 
institution already famous for its tide-predicting, integrating, and other 
complicated mathematical machines. 

Research workers have become nomograph-conscious since the publica- 
tion of Dunlap and Kurtz’s classic Handbook of Statistical Nomographs, 
Tables, and Formulas (508). Other contributions to the literature have 
been made as follows: for standard errors of differences, by Edgerton 
(515); for Blakeman’s test of linearity, by Griffin (522) ; how to construct 
a nomograph, by Griffin (521), and a text by Allcock and Jones (482) ; for 
the coefficient of part correlation, by Griffin (524); for the Brown-Spear- 
man formula, by Griffin (523) ; a book of charts for computing tetrachoric 
r, by Cheshire, Saffir, and Thurstone (497) ; and for the standard error of 
biserial r, by McNamara and Dunlap (544). 


Tables 


Tables for facilitating computations appearing in the past three years 
include: general statistical tables for students of educational statistics, by 





Holzinger (531); Part 2 of Pearson’s Tables for Statisticians and Biome. 
tricians (557) ; tables for inter-percentile ranges, by Masters and Upshal 
(545) ; and for log (l-r*) 4% useful in partial correlation, by Mori (551) 
the text being in Japanese, the tables in English. 


Slide Rules 


Slide rules have been much in evidence, also. There are three for calcu- 
lating ages: two by Constantine (499, 500), the second giving ages to days, 
and one by Yepsen and Dunlap (616), giving quotients for I.Q.’s as well. 
A generally useful statistical slide rule was developed by Dunlap and 
Kurtz (507). Heinemann. (530), in Germany, discussed experiments to 
note the comparative excellence in use of four styles of slide rule, and 
voted finally for “simplicity.” 


Matched Groups 


A formula has been developed by Peters and Van Voorhis (558) which 
makes allowance for the coefficients of reliability in computing the stand- 


ard error of the mean. The formula, %M = 7 ted 1 — 1, approaches 


zero as the reliability of the measure used approaches unity. In the case 
of zero reliability, the correction for reliability disappears (becomes unity) 
and the formula reduces to the usual one for the standard error of the mean. 

This property is particularly important as revealing that much smaller 
than usual critical ratios are, under the conditions implied, “statistically 
of significance.” The article is taken from the chapter on reliability of a 
statistical text promised for early appearance from the Pennsylvania State 
College Press. 

A similar formula reported previously by Wilks (612) and Lindquist 
(543), allows adjustment for the correlation between the variable, on 
which groups are matched, and the variable being studied. The correlation 
between the two variables, r,,, is used in the above equation instead of 
the coefficient of reliability. When used in obtaining the standard error 
of the difference of means, use of the formula eliminates the necessity of 
laboriously matching individual with individual, but requires only the 
equating of groups on the basis of means and standard deviations. It also 
allows the use of groups of unequal size. 

Rulon and Croon (569) described a method of eliminating cases from 
different groups so as finally to obtain groups with equal means and stand- 
ard deviations on the measure on which they are to be matched. 


Probable Errors 


Cureton (504) published an epochal monograph on the probable errors 
of many of the constants most used by educational workers. Other prob- 
able error formulas recently devised are: standard error of a multiple 


234 





regression equation, by Miner (546) ; of a tetrad in samples from a normal 
population of independent variables, by Wilks (611); of the average 
intercorrelation and of the average criterion correlation, by Cureton (505) ; 
and also of the Spearman-Brown formula (506). 


It remains to be seen whether big-scale research, with its tens of thousands 
of cases, will make probable errors unnecessary, or whether we shall go 
to the other extreme and try to conduct our experiments with hardly any 
cases at all. Fisher’s treatise (517) on small sampling theory has become 
a classic for the biologists, agronomists, animal psychologists, and others 
who cannot well afford to indulge in statistical sprees of hundreds of 
thousands. So likewise has become Shewhart’s study (571) in the eyes of 
manufacturers who cannot afford to demolish even one $100,000 machine 
to guess whether the remaining nineteen ordered by a customer will work 
or fly to pieces. Certain it is that a probable error seldom—much more 
seldom than should be the case—has been used to control the course of 
educational experimentation. Usually such coefficients have been calcu- 
lated for publication as an afterthought, on the insistence of the embryo 
Ph. D.’s major adviser. It is well known that the famous dictum, “If you 
knew the history of a single tree you would know the history of the uni- 
verse,” was intended for rhetorical effect rather than for practical applica- 
tion. Nevertheless, our analysis of causation and our notions of reliability 
are bound up in our notions of the causes, sources, and amounts of error. 
It is hopeful, as well as humorous, that a new variety of error is announced 
monthly! 


Scoring Formulas 


The new test-scoring machines and test forms in themselves have not led 
to any new formulas, and it is axiomatic that a new formula in time leads 
to a host of new concepts and practices. Motivated by the development of 
such devices, however, we have a promise in the coding methods developed 
by Toops (595), and their accessory formulas, not only of painless test- 
scoring and regression-solving (using scoring books), item analysis, and 
multiple-choice-alternative-weighting, but also of a basic attack upon the 
problem of profiles of human ability. 


The Strong Interest Blank has been revised in its scoring-method and 
adapted to Hollerith machine scoring (hand selection of the recorded 
responses) by Strong and Green (577); and the International Business 
Machines Company (485:19) have devised an attachment for their tabu- 
lator making pre-sorting of the cards unnecessary. Stagner (575) devised 
a mimeograph additive method of weighting tests for which multiple regres- 
sion weights are available. Negative weights are overcome by adding a 
constant of sufficient size to each WX product to render all positive, and 
this is finally corrected for (subtracted out of) the constant term of the 
regression equation. The method has been adapted to the multiple-ratio 
technic by Toops (594). 
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Criteria 


The obtainable validity of tests is limited by the reliability and validity 
of the criterion used. Since the classic treatment (556) there have been 
several basic contributions. Thorndike and his associates (580) employed 
wages, job satisfaction, and level of work attained as measures of occupa- 
tional success. Hoppock devised a scale for measuring job satisfaction. 
Richardson and Kuder (564) used a checking of “specifics” blank, evalu- 
ated by the Thurstone attitude scaling technic as a substitute for super- 
visors’ ratings in deriving a criterion. The employment of these will im- 
prove the criteria of the future. Easley (514) pointed out the increased 
reliability of long-time samples as versus short, and the possibility of 
greater validity thereby attainable. We cannot escape the need for more 
samples of our criterion. Turney (601) pointed out that a number of short 
tests have a cumulative reliability. The reviewer suspects that many shor! 
tests have the property of keeping effort at a maximum, so that the final 
examination more nearly reveals the true ability of the student. If stud) 
time be reduced to a constant, as at West Point, the validity of intelligence 
tests jumps markedly as shown also at Yale (556). This, in connection 
with Courtis’ observations (502) regarding the hodge-podge nature of our 
measurements and particularly our disregard of the intent (motivation) 
of the testee, suggests that our criteria are hodge-podges that could not be 
expected to be predicted well by any tests, however good. 

The problem in its essence is “What constitutes success?” It is clear. 
then, that practice in this realm will be benefited by whatever advances are 
made in factor theory, causation, variance, and patterns of traits. Edgerton. 
in an unpublished manuscript, proposed to find that system of least-square 
weights for the several criterion components, or variables, which, with a 
similar series of weights for the several tests of the test series, will maxi- 
malize the correlation between the two sides of the equation. 

Hotelling (539) solved the same problem by an appeal to a type of 
mathematics which will be unintelligible to most readers of this review. 
Whether the problem is answered philosophically by these mathematical 
solutions is an open question. Success is a profile. And when John Citizen, 
now prognosticated to be of average aptitude for occupations A and B, is 
faced with the alternative of choosing between A, in which he is predicted 
to be fast but inaccurate (therefore averaging out to mediocre), and B, in 
which he is predicted to be slow but accurate (again averaging out to 
mediocre), may greatly prefer to choose the latter in preference to the 
former. It is only, or mainly, for wage-payment purposes that we require 
a unitary criterion rather than a profile. And when it is realized that wages 
are a means rather than an end, even wages may be paid on the basis of a 
profile of need: so much for food, so much for recreation, so much towards 
the publication of a book—whatever are the purposes of the recipient 
which wages may foster into a more abundant life. We, too, no doubt would 
pay our debts more gracefully if they were detailed (a profile), so much 





of taxes, for example, for police protection, so much for water-main 


repair, etc., instead of the uninteresting lump sum, “Taxes for six months, 
ending June 30, $100.” 


Swab and Peters (578) showed that pupil estimates of fellow-pupils 
have considerable validity in the case of estimating age, height, and marks 
in arithmetic, but do rather badly in estimating brightness. The reliability, 
for 30 cases only, ranges between .95 and .99 for six traits. These results 
suggest the employment as a research procedure of pupils’ judgments of 
pupils. Remmers (562) and Adams (480) suggested inserting in rating 
scales judgments of objective qualities such as length and weight as a means 
of measuring and eliminating halo and of measuring the effects of and 
giving practical meaning to such concepts as accuracy, error, and relia- 
bility. This is an improvement over Thorndike’s practice of inserting, say, 
beauty, as a joker to test the presence, but not very precisely the amount, 
of halo. 

Criteria may be improved by such methods, employing these technics 
to test, pick, and choose (or weight) our judgments, analogous to the 


several technics now being used to select, discard, and weight tests, sub- 
tests, and items. 


Reliability and Validity 


In a similar attempt to improve reliability we have the well-known 
methods of (a) lengthening the test; (b) interviewing and cross-ques- 
tioning the experimental subjects regarding their failures on individual 
items; and (c) working for internal consistency of the items by statistical 
means. We might hazard a guess that factor theories ultimately will be 
found useful here. The method of lengthening the test naturally languishes 
since the trend of the times has been towards shorter tests (results being 
equal), and since it is thought by competent people that it is more meri- 
torious to work for greater validity and allow reliability to shift for itself. 
Employing this principle, the method of interviewing has been used by 
Valentiner (602) in making tests more valid. A method, called the 
L-method, of shortening a test while maintaining its reliability (equally 
applicable to making a test shorter while maintaining its validity or to 
the construction of alternative forms of test) was devised by Toops (596) 
and used by Royer and Toops (566) and by Hartson (527) for picking 
tests and sub-tests of the greatest in-combination validity, as an alternative 
to multiple correlation procedures. Its basic concept is that a feasible or 
practical test must have the elements weighted with equal gross score 
weights; and the necessary formulas, involving in effect the intercorrelation 
coefficients (actually the gross score intermoments), are deduced and- 
applied to that problem. The process works by the build-up-a-scale proc- 
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ess, quite the converse of multiple-correlation methods where enormous 
regression equations must be solved and the tests (or sub-tests, or items) 
of low weights then must be discarded. 

A text in the combined field which bids fair to become a classic has 
been written by Thurstone (583). 


Item Study 


The validity of a test, as well as its reliability, is conditioned by the 
items of which it is composed. That an item has validity, difficulty (mean 
score or percent right), and even reliability, puts the item in the same 
statistical category as the test or sub-test, with the added advantage that, 
existing in two degrees only (Right = 1; Failure = 0), it admits of the 
application of various probability and other technics not applicable to 
multi-categoried test scores. These technics have been reviewed recently 
by Osburn (555). 


Transmutation 


Heilman (529) achieved a transmutation of test scores by correcting for 
differences in mean, but not for differences in standard deviation. 

Toops (590, 599) raised the issue of translating, by means of the 
regression equation—which corrects for both—high-school marks obtained 
on university students originating in many high schools. He proposes to 
employ transmutation equations computed individually for each high 
school of a state, the principle being that pupils of like intelligence (or 
of a common score on tests of capacity or attainment) should be trans- 
muted to the same transmuted mark, no matter what the marking system. 
This corrects not only the marks of the average pupil, but differentially 
those of either extreme as well. It ignores motivation and quality of in- 
struction by assuming these equal in different schools for pupils of like 
intelligence. Ogan (553) established the principle that both a pupil's 
intelligence and his marks in other subjects should be used to determine 
whether a particular instructor has the “proper” distribution of marks, 
thus equating for both intelligence and motivation so far as the latter 
is measured by “marks in all other courses but this.” By such means criteria 
will be improved. 

Horst (532, 534, 536) published three articles on the matter of obtaining 
comparable scores in distributions. 

Toops (594) contributed a method whereby qualitative data may be 
quantified and then employed in regression equations as readily and as 
meaningfully as strictly quantitative data. The principle is that categories 
must be quantified by assigning to each, as a transmuted score, a quantity 
proportional to the average criterion score of the category in question. The 
same principle rectifies curvilinear regressions. Bingham (489) suggested 
that qualitative data may be accorded a transmuted score which is the per- 
cent of those individuals of the category in question who attain “success,” 
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this being arbitrarily defined as a “passing point.” All of these indicate 
a healthy striving for “the proper units” of measurement. 


Norms 


Courtis (501) pointed out that norms are all but meaningless unless we 
take into account the characteristics of the individual’s growth curve. As 
an extreme illustration, a person growing at a normal rate, one year of 
mental age per chronological year, and therefore destined ultimately to 
attain a normal adult intelligence, may get a late start and so year by 
year improve in I. Q. As a corollary, an I. Q. of an individual cannot 
be ascertained by one measurement alone, three being the very minimum 
necessary. 

O’Rourke (554) pointed out that the proper norms (standard scores), 
based on the groups in industry with which a pupil later will compete, 
are essential to the construction of profiles for guidance purposes. 


Technics of Selection 


The line between guidance and selection is always a thinly drawn one. 
That the two are not necessarily incompatible if we look at the matter 
from the educational statesmanship view, was shown by O’Rourke (554) 
and by Toops (590, 593) who (597) also pointed out the dangers of the 
at present much used and much abused successive hurdles method of selec- 
tion. By transmutation of marks and collection of other to-be-had-for-the- 
asking variables the seniors of the high schools of a state may be ranked 
on a scale of college aptitude which will be most useful in college recruit- 
ing, and, if wisely done, serve the ends of good guidance, since college 
is one of the “opportunities.” The necessity for a measure of “financial 
need” is pointed out. 


Standardization 


That statistics is difficult because of its lack of standardization of symbols 
and formulas was shown by West (609) and Yntema (617). Monroe (548) 
proposed that a certain list of statistical symbols be regarded as standard; 
while Toops (598) proposed standard codes for coding qualitative data, 
and transmutation equations for quantitative data. 


Novel Applications of Statistics 


Reymert (563) used the bids technic to create a scale for measuring the 
ability of the psychologists of the several colleges of a state. Ward (607) 
used the multiple-ratio procedure to ascertain the salary formula of a 
college faculty as a means to setting up the proper faculty record forms. 
Burgess (493), Glueck and Glueck (520), Tibbitts (587), Vold (603), 
and Monachesi (547), sociologists, employed the traits, environments, 
and treatments of prisoners as “presence-absence” variables, reversed if 





the correlation were negative, to predict ability to succeed on parole, and 
obtain validities which should figure out to a coefficient of .50 or better. 
Kurtz (542) employed the multiple correlation technic to determine the 
point-values which statistically should be allocated to letter grades in 
college courses, utilizing as criteria two measures of subsequent success— 
persistence and graduation—and two measures of previously demon- 
strated ability—high-school marks and intelligence. Thus, marks of A, B, 
C, and D may be accorded positive weights of 4, 3, 2, 1 without much 
error, from the viewpoints of subsequent predictability and subsequent 
use as prognosticators of persistence and graduation; but the grade E 
(failure) must be accorded a sizable negative score if we would properly 
account for its deleterious ‘effects on persistence and graduation, thus 
raising an interesting problem in the motivational possibilities of marks. 

Nanninga (552) fitted exponential curves to his data regarding costs, 
size, and offerings of California high schools. Courtis (501) devised growth 
equations and a rotation capable of taking account of superimposed cycles 
of growth. Gulliksen (526) computed a rational equation of the learning 
curve based on the law of effect (motivational effect of satisfaction or 
dissatisfaction), one of the few attempted rational equations of psychology. 
This, together with Thurstone’s earlier logical learning equation, com- 
prises practically all the logical equations the educational psychologist 
has to show for his labors to date. No treatise on the methods of derivation 
of logical laws seems to be available, although the month that does not 
bring forth a new statistics text is rare indeed. 


Summary 


The high points in statistical development of the past three years are: 


1. Definite attempts to improve statistics teaching and statistical reporting, through 
analysis of the mathematical and learning difficulties of students and through standard- 
ization of statistical symbols, procedures, and forms of reporting 

2. Introduction of the topic of estimation on a scientific basis 

3. A growing literature on profiles and patterns, their representation and significance, 
a logical development of the recent emphasis upon factor theories, unique traits, and the 
“elements” of behavior generally (Having the elements at hand, we must compound 
them. The result is profiles.) 

4. The growing realization that the profile, rather than a single index, is the most 
meaningful way'to represent all the “variables” in the educational process—human 
traits, environments, occupational (including educational) results (criteria), and costs 

5. Wholesale, statewide testing programs with their inherent research possibilities 
of locating all persons of profile number 318 for follow-up study and comparison with 
those of profile number 317, an alternative to multiple regression methods 

6. The possible subsequent utilization of all human experience—“natural experi- 
ments”—as criteria 

7. The growing interest in the characteristics of a group in the light of the sub- 
societies of which it is compounded 

8. Excursions into the domain of more complicated equations, particularly those in 
which time is a function; and the provision for multiplicative as well as additive rela- 
tionships 
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9. The appearance of a few applications of factor theories among a host of theoretical 
articles on the topic 

10. Immense advances in applications of statistical machines to research, and in the 
means of representation of statistics, and the utilization of qualitative and irregular 
data (A healthy interest in the units of measurement is noted.) 

11. A goodly and much-needed beginning in the theory of criteria 

12. Progressive developments in multiple regression solving, in item analysis, and 
in alternative technics of test building 

13. A beginning by two or three persons of the utilization of time in equations and 
of consideration of the possibility of logical laws in place of empirical laws 

14. A growing number of applications of statistics to novel problems. 


By way of criticism and suggestion we may note that: 


1. Our concepts of success and its measurement by criterion scores are in a chaotic 
state, particularly the foundational philosophy. 

2. The development of logical equations is scarcely appreciated as a need by the 
rank and file of research workers. No text on the methods of logical derivation of laws 
seems to be available. 

3. Few experiments are ever repeated or reverified. 

4. We need equations which shall take account of the effect upon the individual's 
products (criteria), not only of his traits but also of his associates and environments. 

5. We may be lulled into a false sense of scientific security and well-being by the ease 
with which soon we shall be able to do researches by the available methods, and in our 
preoccupation with quantity of research may fail to take account of the need for laws 
and equations which are more complicated but more true to life. 

6. There is a danger that the development of small sample theory and the new 
formulas for unreliability coefficients may be used as excuses by many for slip-shod 
research on too few cases. The publication of unreliability coefficients (even small ones) 
does not guarantee the reliability or validity (the dependability) of research. 

7. There is a general failure to consider any test score as a cross-section view of a 
time-growth curve with all its statistical implications. 

8. There is a need for the development of statistical machines which shall render 
unlimited not only the numbers of persons and the range of the variables, but also the 
number of variables. To treat each of a hundred variables as a growth process, annually 
measured, will require, say 2,000 variables per person. 

9. There is need for the statistical and theoretical development of the topics of 
transmutation, employment of qualitative data, and statistics of societies in the light of 
their component sub-societies and individuals. 

10. There is need for the theoretical development of the statistical and philosophical 
relationship of guidance and selection, viewing the problem as a national, or inter- 
national problem in planning; and for a succinct statement of the statistical theories of 
selection now implied in such practices as the successive hurdles method of the schools 
in occupational preferment, the practices of civil service, of honors-granting bodies, 
and of employment bureaus in industry. 

11. We need researches into the salary formulas of schools and industry, a problem 
growing out of the preceding one. 





CHAPTER V 


General Survey of the Field of Character and 
Personality Measurement 


Is rreatinc THE MATERIALS for the present issue of the Review of Educa- 
tional Research, the writers have taken as a broad basis for division the 
general dichotomy of implicit and explicit behavior, personal and social, or 
internal and external. Inner adjustment, interests, and attitudes are repre- 
sented in Chapters VI and VII. In these divisions we have the person’s feel- 
ings, emotions, conflicts, and verbalized attitudes toward various issues. It 
is recognized that these may be independent of, or only partially dependent 
on, information, and may or may not be expressed in action. All technics 
of personality study which utilize an external observer have been treated 
in a final chapter (VIII) under the general heading of measures of infor- 
mation and conduct. In the general division, tests of information concern- 
ing laws, customs, and conventions have been treated as overt performance, 
since they have an objectivity in the external situation. The classification 
adopted does not, of course, entirely avoid overlapping on the basis of 
topics, methods, or particular investigations. 

The volume of pertinent material in the three-year period has necessitated 


a selective treatment. An attempt has been made to retain references which 
illuminate the general issues, develop new technics, or summarize bodies 
of material. A number of bibliographic summaries, books, and journals 
have appeared which cut across the plan of division of the report or are 
indicative of trends which affect the field as a whole. These are described 
in the present chapter. 


Reviews and Books 


Watson (634, 635) published comprehensive reviews of the field in 1932 
and again in 1933. These have been brought down to July, 1934, by Maller 
(625, 626), who has also made an intensive three-year study of publica- 
tions in the German psychological literature (627). A convenient list of 
character and personality tests and ratings may be found in Hildreth’s 
general bibliography (622). The compilation gives author, publisher, and 
a pertinent reference to each instrument available up to about 1933. Some 
of the better known tests and ratings for social adaptation have been listed 
by the United States Office of Education (630). 

Fauville (618) reviewed experimental studies dealing with the structure 
of personality. His interest was in the mutual relationship between phy- 
sical, temperamental, and character traits. Jones (623), in addition to 
a review of the general literature on methods of character measurement, 
presented a tabular summary of test situations which have been employed 
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for performance, knowledge, and attitudes. Garrett and Schneck (620) de- 
voted one chapter in their general treatise on psychological tests, methods, 
and results to the measurement of personality and temperament. 

Haddock and Ellis (621) prepared a source book for the use of persons 
engaged in teaching or studying the development of character. The authors 
give an account of the history of the study of character and personality 
from about 415 B.C. to the present. The book contains a bibliography of 
803 titles and 18 pages of subjectmatter index. 

The book by the Murphys (629) on experimental social psychology 
gave a critical review and discussion of many of the current investigations 
in social relationships. Murphy and Jensen (628) dealt with a general 
theoretical organization of personality research, including the approach 
from experimental psychology, from psychoanalysis, and genetic study. 

Symonds (632) prepared an exhaustive treatment of methods for diag- 
nosing personality and conduct. Observations, rating methods, question- 
naires, tests, free association methods, and physiological measures were 
considered. Special chapters were also concerned with the technics of inter- 
viewing and psychoanalysis. Judgment of character from physical signs 
and measures of the environment received consideration. This work was 
followed by a treatment (633) of character and personality data with 
particular reference to social adjustment as concerned with criminal ten- 
dencies, mental disorders, vocational fitness, and citizenship. A helpful 


classified descriptive list of tests, questionnaires, and rating scales was 
included. 


The annual summaries of research in progress which have appeared in the 
American Journal of Sociology indicate that the problems of personality 
in relation to the culture represented a predominant interest of both stu- 
dents (631) and mature investigators (624). 


New Journals 


The three-year period covered by this Review has seen the birth or 
major development of several periodicals devoted wholly or in part to 
problems of character and personality. The American Journal of Ortho- 
psychiatry (145 East 57th Street, New York City), was begun in 1930. 
The emphasis of the Journal is on clinical approaches to treatment. Prac- 
tically every issue, however, contains articles based on standardized test 
situations for measured appraisals of personality or social factors or on 
the statistical treatment of qualitative items. Child Development (Williams 
and Wilkins Company, Mount Royal and Guilford Avenues, Baltimore, 
Maryland) began publication in March, 1930. It regularly carries material 
pertinent to the measurement and development of character and person- 
ality in children. The journal, Character and Personality (Duke University 
Press, Durham, North Carolina), was founded as an international quar- 
terly, with the first number appearing in September, 1932. The journal . 
tends to be philosophic and speculative in character, but it is catholic in 
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its range and contains technical reports as well as material which supplies 
a frame of reference for measurement studies. The Journal of Experimental 
Education (Edward Brothers, Inc., Ann Arbor, Michigan) first issued in 
September, 1932 has devoted space to a number of articles on observa- 
tional and attitudinal measures of personality. The first issue of Character 
(Religious Education Association, 59 East Van Buren Street, Chicago, 
Illinois) appeared in October-November, 1934. This periodical attempts 
to generalize and popularize technical material for persons who are respon- 
sible for programs of action. 


Applications 


A survey of tests by Fenton and Wallace (619) in twenty-eight child 
guidance clinic centers in the United States indicated that the bulk of 
clinical instruments in the field of personality and character represent 
editions or special scoring methods for the type of personal reporting 
initiated by Woodworth. This is, of course, the most convenient plan for 
clinical use since the patient must supply the information. The same gen- 
eral trend was found by Witty and Theman (636) with somewhat greater 
evidence of the use of tests of the May-Hartshorne type. 

It would appear that the development of tests of conduct and informa- 
tion and of systematic observations and ratings in the social setting from 
which the child comes has affected only slightly the general practices of 
clinics. 


Needs 


While we are concerned in this chapter primarily with the present state 
of research on means of studying personality, it appears well to point out 
certain avenues of scientific thinking which suggest the need for new 
directions. While usually ready to concede the technical gains made in 
measurement approaches, the Gestaltists and psychoanalysts have also 
illustrated the desirability of a broader theoretical orientation upon which 
to build research programs which are concerned less with the peripheral 
responses and more with the internal aspects of the problem. There is need 
to use the quantitative approaches on the problem of how personality 
becomes organized in interaction with the culture, and to have more longi- 
tudinal studies in which persons are followed over periods of time. 





CHAPTER VI 
Mental Hygiene and Emotional Adjustment 


Instruments for Self-Report 


IL HAS BEEN common practice to devise a set of questions (or to modify a 
set prepared by previous workers) asking subjects to tell of their troubles, 
their symptoms, their desires, their behavior tendencies, etc. Wrightstone 
(758) used the oldest, the Woodworth-Mathews, to compare with teachers’ 
case studies of pupils. Thurstone’s schedule was criticized by Harvey (676), 
compared with clinical studies by Hanna (673), and with hospital diag- 
nosis of psychotics by H. N. Smith. Willoughby (756) worked out a short 
form derived from Thurstone’s and published norms. Bernreuter (644) 
combined questions by Thurstone, Laird (introversion), and Allport 
(ascendancy), adding scores for self-sufficiency (643) and defending 
reliability and validity (645). Marshall reported a try-out of the Bern- 
reuter schedule on 371 consecutive patients in a neuropsychiatric hospital, 
showing the median psychoneurotic at the 80th percentile of the normal 
population in neuroticism score, the schizophrenics all above the 60th 
percentile norms on introversion, the manics all below the median in intro- 
version. Stagner (729) agrees. Murray (705) arranged a set of 46 ques- 
tions which differentiated emotionally maladjusted and delinquent boys 
(age eleven to sixteen) from controls. Cavan (657) modified this slightly 
for use with girls and reported a correlation of .6 or .7 with the scale of 
24 questions used in the White House Conference studies. Ingle and Barton 
report giving 150 questions on emotional stability to college students. 
Whatever 75 percent of the people answered they called the right or 
“stable” answer, and anyone with twenty or more uncommon answers was 
labelled neurotic. This is typical of the worst in blind objectivity. White 
and Fenton (749) found their collection of questions differentiating little 
between delinquent boys supposed to have inferiority complexes, and 
normal delinquents. Bell (642) published a new adjustment inventory of 
140 questions classified as bearing upon home, health, social and emo- 
tional adjustment. Smith’s Self-Comparison Inventory (727) seemed to 
measure inferiority feelings among high-school students with satisfactory 
reliability. The Presseys (714) revised the X-O to permit scoring for emo- 
tional age. They found problem children more apt to fall outside the 
middle range of scores. Flowers compared the old Pressey X-O scores for 
45 psychotic patients who improved enough to be discharged and found 
no difference between them and the scores of those who became worse. 
Maller’s Character Sketches (697) with two parallel forms contain . 
statements descriptive of habits, self control, social adjustment, personal 
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adjustment, symptoms of mental disorder, and readiness to confide. The 
subject tells whether he feels that he is the same (S) or different (D). 
Reliabilities for the junior high school were found to be over .9. Correla- 
tions with Woodworth-Mathews and Thurstone ran about .6. Because of 
simple language and broad scope, this instrument is probably the best 
available tool for self-description by pupils of junior high-school level. 
Results with this, as with every self-report form, must be interpreted, of 
course, in terms of the impression the subject wishes to give. 

Guilford (671) reviewed 115 studies on introversion-extroversion, noting 
as did Gilliland some low intercorrelations due to widely different defini- 
tions of the concepts. New questionnaires set up by Root, Roberts, and 
Kubo seem to add little. Studies by Stagner and Pessin and by Guilford 
contain painstaking statistical analysis of introversion-extroversion items. 
Guilford and Hunt (672) also found the rate of fluctuation of a reversible 
cube perception, previously suggested by McDougall as an index to intro- 
version-extroversion, unrelated to any other supposed index of these types. 
Mierke (704) used as an index of extroversion the ease with which a sub- 
ject, working at a pleasant task with unpleasant colors, would come to like 
the colors, or working at an unpleasant task with preferred colors would 
come to dislike the colors. 


Manzer (700) reported self-ratings on the Allport Ascendancy-Submis- 
sion Test to contain the usual distortion toward making a favorable im- 
pression. Beckman modified the A-S scale for business use. 


Inventories to help make case histories more complete and more objec- 
tive have been developed by Stefanescu and others (730), Scherke (723), 
and Patry (709). 


The major new technic, so far as self-rating instruments are concerned, 
has been the applications of factor analysis. Willoughby (754) began with 
the six constellations Darrow had previously found in the Thurstone scale. 
In both multiple-factor analysis and two-factor analysis the category 
called “Fantasy” seemed heavily loaded with the first general factor, while 
the categories “Sex” and “Parental” showed least. The Guilfords (670) 
worked out the relationship of each of 36 items to a general factor of 
introversion. Thurstone’s multiple-factor analysis suggested at least 18 
group factors. Pallister (708) analyzed a battery of measures (Lecky’s 
200-question Individuality Record, a vocabulary test on 160 words, a 
personal data sheet, ten rating scales, and a set of physical measures) to 
find the relationship of each to a general factor: tendency of the person 
to withdraw from the environment. Perry (710) gave a battery (Bernreuter, 
Pressey X-O, Allport A-S, Colgate B2 and C2, intelligence, and achieve- 
ment) to 300 college entrants. Ignoring the distortion of answers by the 
conditions under which the questionnaires were answered, he went ahead 
with correlation, tetrad, and multiple-factor analysis. He found some 
reason for believing that neurotic tendencies, dominance, and intelligence 
are independent variables. Present uncertainty regarding the psychological 
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value of the factor-analysis technic would seem to be supported by these 
studies. 

Among other suggestions for advance in technic mention should be 
made of: Vernon’s re-emphasis (739) upon the importance of the attitude 
of the person tested; Rosenzweig’s proposal (722) that the subject be 
asked not what traits he has, but the question more likely to be truth- 
fully answered, regarding traits he would like to have; and Uehling’s 
argument (738) against questions too difficult and too pointed. 

The usual criticisms of self-report measures are: 


1. Subjects do not answer reliably. 
2. Answers are arranged to make a good impression; this is particularly true of the 


more intelligent or experienced persons, and in situations in which they suspect that 
their vocational status may be affected by the outcome. 


3. None of the available measures has been built by persons with sufficient clinical 


experience and insight to permit the instrument to be really diagnostic. Counting 
symptoms is not diagnosis. 


A few investigations relate to these criticisms. 

Cavan (658) checked the consistency of answers to a neurotic inventory 
by repetition within a week. Agreement was reported on 83 percent of 
the items. Siblings answering questions about their home agreed to the 
extent of 93 percent on factual items, and 62 percent on estimates. Bain 
(637) repeated 61 items of personnel information after 244 months, with 
identical responses on only 77 percent of the items. Differences between 
items classified as “factual family,” “factual personal,” and “subjective 
personal” were not significant. 

Hertzberg (682) repeated the Thurstone schedule after a year in college 
and found about 80 percent of the students showing some improvement 
in score. He found the usual zero correlations with intelligence and school 
achievement, although the “extremely well-adjusted group” both years 
earned fewer grade points than did groups at any of the four lower adjust- 
ment levels. 

Luh and Sailer (694) found that self-ratings by Chinese students in 
China showed the same tendency to over-estimate one’s own good qualities 
which has frequently been observed in this country. 

No attempt has yet been made to construct instruments which will show 
symptoms in relation to underlying causes, patterns, or what Wertheimer 
calls the “radex” of personality. 


Applications of Self-Report Instruments 


No dependable results of scientific or practical interest have emerged 
from the frequently reported giving of some blank or other to delinquents 
and criminals (e.g., the Bernreuter by Hargan, the Neymann-Kohlstedt 
by Ball, the Thurstone by Garrison, and also by Simpson) with no con- 
trols. Simpson (725) found prisoners exceeding college students on favor- 
able self-ratings. Stevens (732) compared 100 recidivists and 100 college 
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freshmen, finding the former more strictly brought up, more religious, 
less friendly with father, coming from larger families. Courthial (661) 
compared 82 delinquent adolescent girls with a much larger contro! 
group, equated for C. A., M. A., I. Q., home background score (Burdick 
test), and socio-economic status (Sims score card). Significant differences 
indicated fewer acts condemned, more worries (Pressey X-O), more 
symptoms of maladjustment (Woodworth-Mathews), more cheating 
(C. E. I. test), more persistence (C. E. I. test), more enjoyment in being 
alone, stronger preference for boy friends, more parental disapproval o{ 
friends, more dates, less family recreation for the delinquent group. 

School success has been another tempting goal for prediction by means 
of symptom questionnajres. Stagner (728) reviewed 45 studies showing 
almost uniformly low, zero, or slightly negative correlations between 
favorable personality report and school achievement. His own correlation 
of grades with the Pressey X-O, his A-B-C questionnaire, the Allport A-S, 
the Laird C-2, the Neymann-Kohlstedt, Thurstone Neurotic Inventory, and 
the four Bernreuter scores showed only one coefficient as high as .15 with 
P. E.’s from .04 to .08. Hertzberg (682) found the same for freshmen 
in a teachers college. Flemming used four forms of score from the Pressey 
X-O and found correlations with grades ranging from —.1 to —.3. Hend- 
rickson and Huskey found a relationship of —.1 (girls) or .3 (boys) 
between achievement and extroversion among fifth-grade children. Harri: 
(674) studied 450 Jewish students entering City College, New York, and 
obtained correlations of many factors with grades. With intelligence held 
constant, partial correlations with age, weight, height, Payne’s inferiority 
test, Marston’s introversion-extroversion scale, number of recreations re- 
ported, number of periodicals read, hours of athletics, hours of sleep, 
number and kinds of books read, were all less than .2. Hours of study 
showed a correlation of .3 with grades when intelligence was constant. 
Symonds and Block (735) have published a questionnaire to help locate 
personal and social maladjustment in grades 7-14. 

Applications to home life are implicit in Campbell’s review (651) o/ 
75 studies on personality adjustment in only children. Most of these found 
little evidence for the association of any type of personality with the on!) 
child. In Campbell’s own work only children showed greater neuroticism 
and greater variability in scores on Bernreuter items, but no differences 
in physique or scholarship. Maller found only children rated as least 
cooperative, but less deceptive on cheating tests than any others except 
children from two-child families. 

A. J. Davis (662) found a correlation of .3 between favorable answers 
on the Woodworth-Mathews and difference in M.A. between the child’s 
parents. There was a correlation of .2 between “Scatter” on the Binet 
and the W-M score. 

Harvey (678) reviewed ten studies of sex behavior and elsewhere he 
(677) appraised the value of questionnaires in such studies. He recom- 
mended: (a) use of a truly representative sampling; (b) follow-up letters: 
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(c) checks within the questionnaire itself; and (d) checks with physical 
examinations and personal interviews on samples of the subjects. Hurlock 
and Klein published a questionnaire study on “crushes” among adoles- 
cents. Childers and Hamil (660) studied case records on 469 children 
and concluded that there were most conduct difficulties among the 106 
weaned from the breast between first and fifth month, fewest among the 
_ 137 weaned after eleven or more months. Leonard interviewed freshman 
college girls and their mothers on problems of college life. 

An instructive method of study is the comparison of many possible 
factors which may differ in subjects well adjusted and those maladjusted. 
Cavan (659) reported that 30 percent of 9,000 adolescents have wished 
they had never been born. These children acknowledged many other malad- 
justment symptoms and were rated by teachers as maladjusted. The wish 
was more common among Mexican children, least so among negro urban 
children, but geographical and racial differences were minor. The un- 
happy children were more apt to be critical of both parents, to have no 
confidential relationship with the parents, to be punished often at home, 
to have no close friends. Lester and Barnette (690) compared college 
freshmen of the most maladjusted quartile (Thurstone test) with the best- 
adjusted quartile. They found the former group to be poorer in intelligence 
and achievement tests and slightly better in grades, with more business 
rather than professional fathers; to contain more Jews, more oldest chil- 
dren, but fewer only children; to be more introverted, less able to con- 
centrate, more severely disciplined at home, and more apt to report their 
childhood as unhappy. Wang (744) compared the top and bottom 20 
percent on each of five tests, using also other factors of personal history. 
Ascendant students reported reading more omnivorously for pleasure, 

being more admired by associates, participating more in games, liking for- 
' eign languages less. Introverted (as contrasted with extroverted by the 
Freyd test) had fewer playmates, needed urging to get into games, had 
practically no friends of opposite sex, went to shows alone. Stagner found 
introverts not reliably different from extroverts on any of ten speed meas- 
ures (Downey W-T type) or on height-weight ratio. 

The Bernreuter test was given by Carter (655) to 133 pairs of twins. 
Correlations of age (12-19) and I. Q. (73-140) with the four Bernreuter 
scores were all negligible. Correlations between monozygotic (40) pairs 
were .63 on the neurotic inventory, .44 on self-sufficiency, .50 on intro- 
version, and .71 on the dominance scale. Correlation among fraternal-like 
sex twins (43 pairs) were for the same four items: .32, —.14, .40, and .34. 
For 35 pairs of twins of opposite sex, the correlations were .18, .12, .18, 
and .18. The reviewer rates this evidence for the inherited basis of self- 
report of maladjustment as probably the most valuable contribution in 
this field. 

Reusser (716) applied Sweet’s Personal Attitude Test for Younger Boys 
to 423 delinquents and a control group in the public school. The delin- 
quents were more critical of the average boy, showed more feeling of being 
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different from other boys, less insight in telling how other boys feel, and a 
greater tendency to regard their own responses as ideal. 

Mathews (702) constructed a questionnaire called a “Home Blank” 
which was given to 568 children. Each question of fact was answered 
“Yes” or “No” giving a “Condition Score,” and also reacted to by “Glad” 
or “Sorry,” giving an attitude score. A “socially approved” score for 
condition and another for attitude was worked out by noting the extent to 
which the child’s attitudes were in agreement with what adult judges be. 
lieved would be generally approved. The children whose mothers worked 
out had a socially approved condition score of 72, while those of non- 
working mothers averaged 78. On socially approved attitude the scores 
were 84 and 86, respectively. Happiness scores for children of mothers 
who worked averaged 84, for the non-working, 88. 

Jersild and his associates (685) questioned 400 children, 25 boys and 
25 girls at each age level from five through twelve, regarding fears, dreams, 
daydreams, likes, dislikes, most unpleasant memories, happiest memories, 
ambitions, wishes, and preferences. Repetition of the interview after five 
to eight weeks gave 65 percent overlap in the content. The study showed 
the relation of each type of affective response to age, sex, social status, 
and I.Q. 

Other comparisons and correlations show: sex differences (Weinberg, 
Stagner); age differences (Stagner); women campus leaders more ex. 
troverted and with stronger inferiority feelings (Sward) ; Roman Catholic 
priests in training as unusually introverted and troubled with inferior- 
ity (Sward); and the attitudes and maladjustments of Jewish students 
(Maller). Mafler and Lundeen (699) found a correlation of .55 between 
emotional maladjustment and acceptance of superstitions, within a popu- 
lation of 300 seventh-grade children. Carroll (654) found no correla- 
tion between art tests and temperament tests (Bernreuter, Bathurst) as 
high as .2. The deaf and hard of hearing seem definitely more neurotic, 
introverted, and submissive (Pintner, Welles, Lyon). Pressey found In- 
dian children to have emotional attitude scores like those of white children 
about two years younger. Sunner found negro students at Harvard Uni- 
versity much like the college white groups previously given a psy- 
choneurotic inventory by House. Brotmarkle found no relationship between 
the Bernreuter scores and scholastic aptitude, mental ability, judgment, 
verbal discrimination, common sense, general information, learning ability, 
motor coordination, or moral vocabulary tests. Apparently the Bernreuter 
is also unrelated to the Kent-Rosanoff word-association test (Laslett). In- 
tercorrelations are reported low by others (Pintner, Stagner, Flemming), 
except in cases in which the tests contain many identical elements. 

Many of the investigations reported seem to confirm what anyone familiar 
with the psychology of personality would have expected. It is possible. 
however, to show that competent persons may not anticipate correctly the 
outcome of questionnaire studies. 
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Watson and Green (747) showed that graduate students of education 
agreed with recent investigations showing: men more apt than women to 
be happy in marriage, preference for parent of opposite sex, more dissatis- 
faction with marriage in the more highly educated partner, persons with 
siblings of opposite sex better prepared for success in marriage, etc. In 
general these educators were misinformed or in disagreement with findings 
on the wide age range for beginning menstruation, the importance for the 
wife’s happiness of sex adequacy in the husband, the decreasing satisfac- 
tion with marriage after the first five years, the prevalence of serious lack 
of self-confidence, etc. Men were no better informed than women, married 
persons did not differ significantly from single persons in their accuracy of 


estimate, and age (twenty to fifty) was not a significant variable in most 
judgments. 


Traits Tested by Behavior 


British psychologists introduced the concept of perseveration and have 
been active in developing and applying tests. Stephenson (731) used in a 
psychotic population the w-w-w test, the z-z-z test, the e-e-e test, the grouped- 
strokes test, and the saying-color test. Each involves ability to shift from a 
simple practiced series to another enough like the practiced series to cause 
serious confusion. The average intercorrelation of the five tests, each tak- 
ing less than five minutes, was .40. Correlation with intelligence (g) was 
~.26. Melancholia was characterized by high “p” and high “g,” dementia 
praecox by high “p” and low “g.” Manics and paranoid cases showed low 
“py” scores. In a retest after several months no “p” scores deviated by more 
than 2 points on a 20-point scale. Pinard (711, 712) used the inverted S 
test, the triangle test, the mirror image test, and an alphabet test with 300 
socially or mentally handicapped children. Intercorrelations were between 
3 and .4. Tetrad analysis showed a group factor present in all the tests. 
“P” score increased with age. Of 49 melancholia cases, 37 were above the 
general median in perseveration; of 8 manics, 7 were below the general 
median. Eight of 10 obsession cases were above the median in persevera- 
tion; 27 of 34 hysterics were below the median. Twenty-four of 30 delusional 
cases were in the middle quartile. Trait ratings showed no such clear cor- 
relations with “p.” Persons (children or adults) rated as difficult and un- 
reliable fell (75 percent) in the extreme quartiles of perseverance or non- 
perseverance. Persons rated as self-controlled and persistent fell (75 per- 
cent) in the central quartiles of the perseverance scale. Rogers (719) found 
correlations of .3 between perseverance and marks in reading, writing, and 
spelling; lower correlations with other subjects. Rangachar (715) used 
seven tests of perseveration to compare 38 Jewish and 35 English boys. 
The “p” scores of the Jewish group were higher in every comparison, but 
the differences were seldom statistically significant. Cattell (656) tried out 
eleven tests of perseveration, the more reliable of which had intercorrela- 
tions averaging .2. Correlations with factors of cleverness, will, maturity, 
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and adjustment, derived from ratings, were likewise negligible. Eight fur. 
ther tests including speed, reading comprehension, oscillation of am. 
biguous perspective, intelligence, and the psychogalvanic reflex showed no 
significant correlation with any of the other measures except for one resul! 
of .61 between speed of reading and the “c” factor (surgency). The con- 
cepts are elusive, the tests fragmentary, the results trivial. 

Howells (683) studied persistence (not perseveration) with tests of 
ability to stand fatigue from holding a dynamometer, pain from pricking, 
heat from a grill, electric shock, pinching, and a blunt peg forced into the 
flesh. High scores on these tests tended to accompany willingness to endure 
pain for higher grades, volitional perseveration (Downey), ascendance 
(Allport), religious radicalism, masculinity, being the oldest child, lepto- 
some build, and high academic grades. Wang’s scale (743) for measuring 
persistence involves self-report on 101 direct questions. It correlates .5 
with adjustment on the Thurstone scale. 

“Suggestibility” is a name given to many psychologically distinct be- 
haviors. L. W. Davis and Husband (663) studied the degree of trance which 
a subject developed under a standard hypnotic procedure, and found this 
unrelated to scores for Colgate introversion, fair-mindedness, and Pressey 
X-O affectivity. More intelligent and better adjusted men were more sus- 
ceptible, but this relationship did not hold with women subjects. White 
(750) used ink-blots with and without suggested resemblances. Correla- 
tions among nursery-school children between suggestibility and I.Q. were 
—.40. Stevick (733) applied the progressive weight series as a test of 
suggestibility to 200 adults. Sixty-two percent showed no suggestibility 
and only 20 percent continued to imagine the increase for more than one 
step beyond the point where it ceased. Correlation of religious conformity 
and suggestibility was --.9. Bonte compared suggestibility of children 
when asked questions about a picture they had seen and about a staged 
altercation in the classroom. Eidetics were influenced in 36 percent of their 
answers about the picture, and in 93 percent of their answers about the 
incident. For non-eidetics the corresponding percents were 35 and 838. 
Platonov (713) suggested to three adults in hypnosis that they would 
awake and be only four, six, or ten years of age. They were then given 
Binet tests. The results show an extraordinary correspondence, the ob- 
tained M. A. being within one year of the suggested “regression age” in 
eight of the nine comparisons. Sherman and Crider described a case 0! 
hysteria in which a regressed patient wrote her name, drew pictures, and 
solved intelligence problems on the younger age level. 

Williams (753) used Hull’s test of forward or backward swaying in 
response to suggestion, and reputed catatonic and paranoid types 0! 
dementia praecox patients giving a negative response, the manics no! 
responding at all. 

The Wells Emotional Age Scale was revised by Weber and now con- 
tains problems of eight types: moral association, fear association, interes! 
preference, interest association, collecting preference, blended emotions. 
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matching proverbs, expressing feeling, and emotional analogies. Retest 
reliability of each type, ranged from .33 to .75, for the whole test .78. 
Correlations with intelligence ranged from .27 to .78, for the whole test 
80. Correlations with C. A. ranged from .20 to .61 with .71 for the whole 
test. Apparently the instrument is a sort of intelligence test. Correlation 
with ratings on emotional age was only .3. 

Willoughby (755) worked out a scale with sixty short descriptions among 
which the rater chooses the appropriate one. Each has been judged to have 
a certain value on a scale of emotional maturity. 


Physiological and Laboratory Tests 


The most ingenious and promising advance in technic during the period 
under review was reported by Lasswell (689). He conducted portions of 
psychoanalyses, under conditions which permitted a record of everything 
said, of the patient’s bodily movements, his pulse rate, and the electrical 
conductivity of the skin. Reflection on the probable correlations suggested 
that when the patient was speaking directly about the analyst, he would 
experience more effect and the pulse rate would probably increase. During 
periods of tension skin conductivity would increase but rate of speaking 
would decrease, and the reverse would be true during absence of tension. 
Formal check was made by studying pairs of days in which the second 
varied significantly along one of these lines, from the day preceding. Thus 
on 20 pairs of days showing increased pulse rate the second day, there was 
an average of nine more references to the analyst during the second inter- 
view. For the reverse pairs, with decreased pulse rate the next day, the 
references to the analyst decreased on the average by eleven. Similar con- 
firmation of the other hypothesis was possible. This is clearly only a be- 
ginning of the interesting interpretations which may be expected from the 
abundant data. 

Dysinger and Ruckmick (665) also used verbal, psychogalvanic, and 
pulse records in their study of the influence of moving pictures on 150 
children and adults. Average P G R deflection for scenes of danger and 
conflict was 2.1 mm. for children, 1.8 mm. for adolescents, and 0.9 mm. 
for adults. For erotic scenes the deflections averaged 0.8 mm. at age nine, 
1.0 mm. at age sixteen, and 0.3 mm. for adults. Repetition of the picture 
decreased the effect. 

Nomura used time of delay in a learned reaction and number of errors 
in the reaction, as a measure of surprise-effect. Rowland suggested events, 
graded in their affective character, to hypnotized subjects, and showed a 
fair correlation between the judgment of emotional excitement and in- 
creased irregularity in a breathing curve. Luria’s book (695) is an outstand- 
ing contribution, perhaps the most stimulating combination of theory and 
practice in this review. He conceives of a “functional barrier” which keeps 
conflict in the higher cerebral centers until a course of action is found in 
which the whole organism can be integrated. Premature emergence of the 
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conflict into the muscular system creates disorganized behavior. The func. 
tional barrier is weak in early childhood and in fatigue. Strong emotion 
may cause a short cut or overflow into the motor system giving impulsive 
or diffuse reactions. Luria developed and tested his theories in experi. 
ments involving word association, record of accompanying voluntary and 
involuntary movements, breathing curves, creation of artificial conflicts 
by hypnotic suggestion, tests requiring quick shift from one language to 
another, rhythmical motor reactions, tests involving choice among arbi. 
trary alternatives and alternatives with associated cues, tests which seemed 
easy but quickly became impossibily difficult, voluntary motor action in 
cases with certain nervous disorders, drawings, and other symbolic sur. 
rogates. Unfortunately the book is poorly translated and so badly arranged 
that many readers give up before getting to the last few chapters which 
are the most important. Olson and Jones (707) followed up Luria’s idea 
of disorganization of behavior in conflict situations. They proposed emo- 
tionally toned words and propositions (also control stimuli) to subjects 
for rapid associative reaction. A voice key indicated the time taken to 
respond, and simultaneously the subjects were supposed to press down 
keys with each hand. The emotionally toned material produced significantl) 
more vigorous responses. Correlations with self-rankings were about .5; 
those with rankings by friends about .1. 


Hersey (679) observed twelve workers for some 300 two-hour periods 
and estimated their emotional states. In positive moods (elated, happy, 
hopeful, cooperative, pleasant), production, objectively measured, aver. 
aged 102; during neutral moods (indifferent, excited, tense, mixed) 100; 
during negative moods (unpleasant, suspicious, peevish, angry, disgusted, 
sad, pessimistic, worried) production averaged 93. Averages were con- 
sistent in the case of each man. Hartenstein used introspective reports of 
mood for eight subjects given laboratory tests on different days. There was 
no convincing evidence of the effect of weather on efficiency, and the effect 
of mood varied with the individual. 


Treatment 


Rarely are tests reported as part of a total experimental process: test- 
treat-test. When more studies of this type develop, there will probably be 
more enthusiasm on the part of educators for personality measurement. 
McLaughlin (696) used the Allport Ascendancy-Submission Scale and 
ratings by associates with 400 college students, identified extreme cases, 
and then carried on a re-education through interviews, help from friends, 
selected readings, reports of analagous cases, correction of speech and 
other handicaps, and readjustment of environment. The test was not re- 
peated after the training, but another set of ratings was secured from the 
same judges. Among 12 markedly ascendant students, only 5 appeared to 
have changed in the desired direction. Among 13 extremely submissive 
subjects, 12 showed improvement, 4 marked improvement. Laird and 
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others (688) selected 53 children who seemed, on the Olson Behavior 
Check List, to be unusually nervous. Diets directed toward building up 
calcium metabolism brought about demonstrable improvement over con- 
trol, particularly in the group taking milk and a food concentrate at the 
middle of the morning. Fajans (667), in one of Lewin’s impressive and 
valuable investigations, showed how ambition and perseverance could be 
built up. Experiments demonstrated a series in which the most powerful 
was success in an assigned task which appeared difficult, said success accom- 


panied by encouragement. Success alone had more effect on future behavior 
than did praise alone. 


Tests Related to Unconscious Reactions 


Tendler (736) gave the Kent-Rosanoff Word Association Test to 50 
psychoneurotics and made an unusually careful analysis of the types of 
response. Frequency score correlated with some of the types of associa- 
tion as follows: contrast, .67; similarity, .50; noun-adjective or adjective- 
noun, —.80. Individuality of response correlated -.3 with contrast, —.5 
with similarity, .60 with adjective-noun or noun-adjective score. He pro- 
posed that association by contrast, superordination, and coordination be 
regarded as adult types; association by contiguity, adjective-noun, or noun- 
adjective as juvenile. In the Rosanoff frequency tables were 44 of the 
former and 16 of the latter. In the Woodrow-Lowell norms for children 
there were 11 of the adult type and 44 of the juvenile. McElwee found sub- 
normal adolescents giving many more reactions of zero frequency than 
were given by normal children or adolescents. Copeland studied frequency 
of usage for 15 words and found unfamiliarity to correlate (rho) with 
reaction time .7, failure to respond .9, with psychogalvanic reflex, zero. 
Fisher and Marrow suggested post-hypnotic moods of elation and depres- 
sion, with a result that word associations were slowest in the depressed 
mood and fastest in a normal state. This was true whether stimuli were 
supposed to be pleasant, unpleasant, or neutral. 

The Schwartz social situation pictures are drawings of children in situa- 
tions in which delinquency seems imminent (724). They are supposed to 
be used in supplementing other forms of examination to get beneath the 
surface with delinquent boys. 

The Rorschach Test was used in more than a score of studies published 
during the past two or three years. Hertz’s recent review (681) of Rorschach 
literature contains 152 items. The popularity of this test arises in part from 
the fact that it is almost unique in approaching personality as a whole 
rather than atomistically. A two-volume edition (721) of Rorschach’s 
original study, supplemented by two other articles, appeared in 1932. 
English descriptions of the test have been written by Beck (640) and 
Vernon (740). Variations of the original blots have been used by Behn- 
Eschenberg, Roemer, and Gordon and Norman. Loosli-Usteri and Vernon - 
have published suggested improvements in technic of administration. Hertz 
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(681) summarized reports of reliability, found most of them low, and pro- 
posed a standard procedure which raised most of the measures above .6. 
Norms are inadequate and there has been some controversy about the 
necessity for age norms. Loosli-Usteri (693) found few movement re- 
sponses among children nine to fourteen. Bleuler (648) found siblings 
more alike than are unrelated subjects, and Verschuer (741) found 
monozygotic twins more alike in Rorschach measures than are fraternal 
twins. Vernon (740) found some correlation with artistic aptitude. Hertz 
(680) found color response highly correlated with ascendancy and move- 
ment responses with submission. Color response was similarly indicative, 
in her studies, of Woodworth-Mathews symptom score. Wertham and 
Bleuler (748) used the test before and after taking the drug mescaline, 
with resulting evidences for change in personality. Levy and Beck (691) 
tested patients during a manic attack and after recovery. Recovery was 
indicated by increase in number of replies, decrease in color responses, de- 
crease in originals, and increase in responses determined by form. Ober- 
holzer used the test to aid in differential diagnosis of skull injuries. Meltzer 
(703) reported results on stutterers. Beck (640) found mental age among 
the feebleminded, correlated with excellence of form perception .6, and 
with tendency to interpret the picture as a whole, .5. Hertz’s results (680) 
were in agreement: .5 for superior form answers; but he disagreed on the 
significance of wholes and added a significant correlation (.4) with origi- 
nal answers and a negative relation (—.4) with oligophrenic details. Beck 
(641) in another study found agreement between the Rorschach analysis 
and clinical studies in 33 out of 37 cases. 

Wolff (757) recorded the voice, hands, profile, and free recall of a story, 
for each of his subjects. Later each subject was asked to identify others’ 
and his own. He discovered that errors were in the direction of approach 
toward the subject’s ideal of himself as he would like to be. 


Typology 


With the increasing trend toward organismic thinking and the conse: 
quent necessity to try to study the personality in its unitary, integrated form 
rather than in elements like traits, there has come increasing attention to 
characterology and typology. 

The best verified typological difference is that which takes extreme 
form in the commonly accepted categories of psychosis. The schizophrenic 
differs from the cyclic manic-depressive, and a smiliar difference can be 
observed within the normal population between schizothymes or schizoids, 
and cyclothymes or cycloids. Bowman and Raymond (649), Bigelow (646), 
and Faver (668) have described the pre-psychotic schizoid as seclusive, 
irritable, oversensitive, anal-erotic, narcisstic, feeling insufficient, etc. Smal!- 
don (726) described the cycloid as pyknic, voluble, hyperactive, extro- 
verted. Eyrich (666) found less similarity among epileptics, but proposed 
three typical syndromes which the Rorschach test helped to identify: (a) 
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retardation, impoverishment, inelasticity of mental life; (b) vain and ego- 
centric oversensitiveness and irritability; and (c) restless activity, dis- 
tractibility, tension, but absence of feeling. Willemse (752) used case 
studies, letters, laboratory performance, Rorschach tests, drawings, physical 
measures, etc., in studying adolescent delinquent boys. He agreed with 
Kretschmer’s thesis that the boys with leptosomic build lacked energy, were 
timid, easily influenced, solitary, self-conscious, with silent scheming and 
sexual maladjustment. The pyknics might be sociable optimists, or queru- 
lous cholerics, or undaunted adventurers. Humm and Wadsworth (684) 
standardized a scale for normal, hysteroid, cycloid, schizoid, and epileptoid 
components in temperament. Burkersrode and Ille (650) used self-ratings, 
ratings of others, tests of attention type, flexibility, form and color in- 
fluence, etc., in confirming the Kretschmer typology. Watson (746) re- 
viewed German psychology and summarized results from about 8,000 
cases, in which 50 percent of the schizophrenics were leptosomes, while 65 
percent of the cyclics were pyknics. Revesz (717) listed eleven types largely 
derived from psychopathology. 

Pavlow and his followers have distinguished four types: inhibited, ex- 
citable, iabile, and inert. Khozak (686) added a few intermediate types 
as a result of three conditioning experiments with each of fourteen children. 

Two recent books can be added to those giving fairly comprehensive re- 
views of typology, one in German by Rohracher (720) and one in 
Rumanian by Todoranu (737). The latter emphasizes activity, emotivity, 
and psychic force or intensity as the central variables in temperament. His 
tests lead him to believe that two major types in accord with Kretschmer’s 
thesis can be identified. Heymans and Wiersma were among the first to 
collect extensive ratings on character. Wiersma (751) recently published 
the distinguishing characteristics of some 80 persons (among their 2,500) 
who were described as persistent liars. They were rated as emotional rather 
than non-emotional, inactive rather than active, and dominated by primary 
function, i.e., absence of perseveration. This combination makes the “nerv- 
ous” type. Only one was described as continually industrious; the liars 
were usually said to be lazy, violent, fond of variety, repeatedly changing 
their occupation, and very vain. 

Beck (638, 639) from his experience with the Rorschach test has tried 
to build a typology around four variables: form perception, organizing 
energy, affective drive, and creative ability. The interaction of these with 
the particular environment determines the personality. 


Psychoses 


Development of a psychosis is an extreme giving indisputable evidence 
of personality maladjustment. Reference has been made in several of the 
preceding studies to the use of psychotic groups in the validation of 
instruments. 

Mason (701) used commitment to a mental hospital as an index of - 
emotional adjustment among teachers, and studied 700 such cases. As 
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compared with the classifications from the general population, the teachers 
showed fewer cases of general paralysis, alcoholism, and epilepsy, but more 
cyclic and paranoid psychoses. In heredity likewise these teachers showed 
a large proportion of functional disorders. Among the teachers 77 percent 
were single, while in the general population of the state hospitals only 
about half that many are single. Economic conditions of the teachers were 
superior to those of the unselected cases. Contributing causes present in 
10 percent or more of the cases included: venereal disease, worry over the 
possibility of venereal disease, sex conflicts, excessive masturbation, poor 
health, and school troubles. Only 4 percent of the men reported interests 
in sport, dancing, theater, music, or travel. Reading, study, and religion 
were the common interests. About 60 percent of the cases were regarded 
by friends as sensitive, shy, and seclusive. Two-thirds were graduates of 
normal schools or colleges. 


Comment 


The chaos of theory in relation to emotional growth, neuroses, psychoses, 
and psychotherapy seems to have resulted in many deplorable superficial 
investigations. It seems easy to buy a printed blank, count the answers, and 
compare one group with another. This type of study avoids coming to 
grips with any of the genuine issues. The most urgent need in this area of 
study, is the development and use of measures by persons who are ex- 
perienced in the clinic, who are thoroughly conversant with the doctrines 
of the neurologists, of Freud, Adler, Jung, and their successors, and who 
are at work on some of the basic problems. Sound measurement technics 
cannot develop faster than clear thinking and crucial experiments con. 
cerning the nature of the phenomena to be measured. 





CHAPTER VII 


Social Attitudes 
Social Behavior 


Wane psychologists were carrying on the studies of social attitudes re- 
ported in this chapter, our national income tumbled to half its former size 
and unemployment rose to include at least one-fourth of the workers of 
the country. Political leaders together with many writers of editorials and 
magazine articles expressed a conviction that basic changes were taking 
place in the attitudes of the American people. The most interesting fact in 
this review of research is that a reader going through the more than two 
hundred scientific studies would find very few attempts to describe or to 
interpret any aspect of this social situation. 

Due to limitations of space studies deemed less important (about half 
as many as those here mentioned) have been omitted entirely. 

One study, Almond and Lasswell’s comparison (760) of docile versus 
aggressive personalities among the recipients of relief, did bear directly 
on the present scene and the forces at work creating attitudes. There has 
been much speculation as to the class or group which will furnish the 
drive toward a new society. Lasswell compared 100 aggressive clients and 
100 submissive clients, and found the aggressive to have had a year longer 
on the relief rolls, to have held more political jobs in the past, and to have 
had more arrests for various offenses. Among the aggressive he also found 
a larger proportion who married outside their national and religious group, 
a larger proportion whose occupations dealt primarily with persons rather 
than things, a larger proportion from dangerous occupations, more shifts 
in occupations, more trade union members, more education, larger in- 
comes, and the payment of higher rentals. The aggressive averaged nearly 
ten years younger, were more largely a native-born group from the sur- 
rounding region, were much more predominantly urban, were less likely 
to own their own homes or any other real property, and were slightly more 
apt to show physical or neurotic handicaps. 

Baker (766) studied race attitudes using primary historical sources 
(newspaper clippings, magazine articles, files of correspondence of or- 
ganizations concerned, interviews with participants, visits to the scene of 
action, etc.) and concerned himself particularly with the relative effective- 
ness of ten organizations which have been active in attempting to attain 
justice and equality for the negro race. An incidental portion of his study 
included questionnaire responses from 200 people who had changed their 
attitude. The opinions pointed to personal contacts with negroes, and par- 
ticipation in interracial projects as major influences. Baker identified two 





‘ 
4 
. 
i 


+ wre eaeg 2 irte nee Saran 


broad types of goal (bi-racial versus amalgamation) and two broad types 
of method (conference versus force). The immediate trend appears to he 
an increasing use of force by groups working toward both types of goal. 

Lasswell and Baker were almost unique among those reviewed in this 
chapter in deriving their measures of attitude from overt behavior rather 
than from verbal expressions on a questionnaire or scale. Kelley and Krey 
(812) in their volume of investigation under the auspices of the Com- 
mission on the Social Studies of the American Historical Association at- 
tempted with Thurstone’s help to construct an attitude test for junior high- 
school children. They listed six basic difficulties which led them to give up 
their attempt to use such scales, and which apply in some degree to nearly 
all of the studies dealt with in this chapter: 


1. Administrators were afraid to permit the use of tests which even suggested 
markedly unconventional positions. 


2. Pupils did not understand the terms and concepts. It was impossible to know 
from their checked response what they had really understood by the proposition. 


3. Attitudes vary in depth, stability, and permanence, and the scale responses give 
no clue as to these differences. 


4. Most people are reluctant to tell what they think about important issues except 
under rare conditions of exceptional confidence. “Only the most rabid partisans are 
willing to divulge their attitudes freely, and for such no test is necessary.” Children in 
school are apt to protect themselves by answering what they think is expected; adults 
by refusing to answer the questionnaires. 


5. There is sometimes a tendency toward compensation. The person lax in conduct 


may try, consciously or unconsciously, to make up for this by extreme verbal con- 
demnation of such behavior. 


6. Many of the answers were superficial snap-judgments, given rather to oblige the 
questioner than because of any independent attitude on the part of the subject tested. 


Kelley and Krey reviewed various methods for ascertaining attitudes and 
concluded: “The new-type test is most direct and perhaps, therefore, 
least efficient.” The historian’s comparison of words and acts they would 
rate first, but would put essays, interpreted by the teacher, ahead of scales 
and blanks. The Commission as a whole concluded in their summary 
volume (761) that results must be tested “not in the classroom by teach- 
ers, but in the arena of social and political life and by the long sweep of 
history.” In another chapter, written for but not included in the report, 
Horn and Lindquist (802) recognized many limitations, but concluded 
that if attitude scales were used for description rather than for the evalua- 
tion of attitudes, and were used only in situations in which the testee had 
no incentive to conceal his true attitude, that they offered excellent possi- 
bilities. These authors observed that progress in scale construction could 
not go far without better definition of educational objectives in terms of 
general emotionalized attitudes, and more authoritative description of 
the specific situations in which attitudes should find expression. Other 
reviews, less critical and more in the nature of summaries, have been made 
by Sherman (845), Droba (785), and Fujibayashi. 





Changing Attitudes 


Especially in a society which suffers from the lag of social attitudes 
behind mechanical progress, it is important to investigate how attitudes 
can be changed. Cherrington (779) gave the Heber Harper Test of Inter- 
national Attitudes to nine groups before and after various types of study 
program. All except one (adult women, average age 50, widely travelled, 
from well-to-do homes, meeting weekly) showed movement toward a more 
international outlook. College classes in international relations showed 
considerable gains, but a seminar of advanced students working at Geneva, 
Switzerland, under some of the world’s most eminent instructors showed 
little “gain” in test score. Undoubtedly they gained much from the ex- 
perience, but sudden immersion in the intricacies of international ques- 
tions shattered some of the liberal attitudes they had brought with them, 
and left the students more hesitant about expressing any judgment. Eight 
other groups were tested for attitudes toward war and disarmament before 
and after periods of lecture, seminar, institute, etc. Little change was found 
in groups which had gone through many hours of lecture, study, and dis- 
cussion, but very marked changes were found in two groups subjected to 
brief but powerful appeals. One of these latter heard Kirby Page give an 
address condemning war; the other read the Eddy-Page pamphlet The 
Abolition of War. After twenty-four hours the group who heard the 
speech showed a shift 10 times its P. E., and those who read the pamphlet 
a shift 7 times its P. E. Retest after six months showed that the change 
had shrunk to about half its former size, but was still significantly (4 or 
5 P. E.) different from the initial score. After some months of preparation, 
a group of representatives from several colleges met in a model disarma- 
ment conference, preceding the actual world conference. Students were 
assigned to represent as faithfully as possible the attitudes of some par- 
ticipant country. Commissions like those on the draft convention were set 
up. Professors were on hand to act as consultants, but there was much 
emphasis on student initiative. The conference met for two days in the 
state capitol building at Denver. Participants, 116 in number, were listed 
before and after the conference, using the Thurstone Attitute toward War 
Test and a new test, devised by Cherrington, using the Thurstone technic, 
on Attitude toward Disarmament. Differences were only about twice their 
P. E. and did not vary significantly from a control group, similar in age, 
sex, and college year, but who had no connection with the model dis- 
armament conference. 

Biddle (769) tested the ability of students to recognize typical propa- 
ganda tricks and found that a course of ten lessons which was prepared 
to expose these methods produced significant gain in ability to recognize 
and to discount them, even in changed content. He reported a correlation 
of -.36 between knowledge about international relations and susceptibility 
to articles appealing to prejudice. Peterson and Thurstone (833) showed . 
that motion pictures influenced attitudes (as checked on a printed scale) 
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toward race, nationality, crime, war, punishment, and prohibition. The 
changes persisted over months. De Feo (780) found that the commercially 
exhibited war pictures shown to 15,000 Italian children had the effect of 
glorifying war, and instilling a sense of patriotic duty to fight for one’s 
country. Shuttleworth and May (847) verified the hypothesis that movie- 
going children valued clothes as an aid to popularity more than did chil. 
dren who attended infrequently. Most of the obtained differences were 
explained by these investigators as due to selection of group rather than 
influence of the movies. Vincent (861) found about half of 200 adults 
convinced that the theater had influenced their attitudes toward social 
problems. Robinson (839) tested 400 adults before and after radio talks 
on unemployment. As compared with a control group, the radio audience 
had more suggestions for decreasing unemployment (particularly via un. 
employment insurance), and showed a decrease in the proportion of state. 
ments rated as doubtful. Chén (778) tried out brief speeches (pro- 
Japanese, pro-Chinese, or neutral-factual) on Manchuria and on Oriental 
art. He found each type of material effective with persons originally at 
any position on the scale. Marple (824) obtained further data on the com. 
parative influence of majority opinion and expert opinion, finding, as did 
previous investigators, that knowledge of what one’s group thinks or 
knowledge of what authorities answer, brings about marked tendency to 
agree with such influences. In this case group opinion seemed somewhat 
more effective than the opinion of the experts. Marple added the factor of 
age, finding the greatest susceptibility among 300 high-school seniors, a 
middle position for 300 college seniors, and least susceptibility among 
300 adults averaging 40 years of age. Kulp (817) found students of edu- 
cation more responsive to the prestige influence of liberal educational 
authorities than to the similarly offered answers of liberal laymen or 
liberal experts in the social sciences. A retest eight weeks later, using simply 
the Harper questions and no report of answers by experts, showed that 
the prestige influence was still effective on 90 percent of the changed items. 
Kroll (816) applied the Harper test to six twelfth-grade classes in Febru- 
ary and again in June to compare the influence of conservative and liberal 
teachers. Of the three conservative teachers one had an influence clearly 
in the liberal direction, another probably slightly in this direction, and 
the third teacher’s class showed no change in attitude. The three liberal 
teachers all exercised very strong influence in the liberal direction. (Prob- 
ably some of the apparently “liberal” change should have been attributed 
to the content of the course of study.) There is apparently a tendency for 
college students to shift toward liberalism on the Harper test in the tran- 
sition from the freshman to the graduate year. Boldt and Stroud (772) also 
found that liberalism tended to be associated with the number of hours 
of work taken in the social sciences. The data were interpreted to show 
the influence for college attendance and types of courses. Annis and Kirk- 
patrick found “planted” content in college papers influential. Campbell, 
Droba, Salner, and others reported that students are influenced in a |ib- 
eral direction by liberal courses in college. 





Jack (808) used an interview technic before and after a four-month 
course in parent education. Differences on the whole were slight, but as 


might have been expected, those who were most backward at the beginning 
made larger gains. 


Correlates of Social Attitudes 


Another type of study, less direct and convincing, is based upon the 
correlation of differences in attitude with other observed differences. Thus 
Kulp and Davidson (818) found siblings alike in attitudes on interna- 
tional relations to the extent indicated by a correlation of about .3. Wright- 
stone (870) constructed a test of liberalism on race and international and 
politico-national questions, which was administered to about 400 pupils. 
Liberalism was correlated with historical knowledge (.58), number of 
periodicals read (.37), socio-economic status (.28), number of courses in 
the social studies (.19), verbal intelligence (.11), emotional stability (.08) , 
and masculinity. Harris, Remmers, and Ellison (798) used Harper’s Social 
Study Test with about 300 students at Purdue. Liberalism, in their find- 
ings, was correlated with intelligence (.29 for men, .09 for women), with 
self-estimates of liberalism (.3 to .4), with never going to church (-.65), 
with absence of religious preference, belief in evolution, independent po- 
litical affiliation, with the study of sociology, and with masculinity. Whit- 
taker (866) constructed tests of opinion about government and capital 
and labor for 600 high-school pupils. Rural groups, compared with town 
and city pupils, seemed most strongly devoted to the present government, 
also to favor labor rather than capital. 

Carlson (775) found a correlation of only .21 between scores for 
information and self-rating on the certainty with which opinions about 
presidential candidates (1928) were held. Intelligence had zero correla- 
tion with general certainty, but a correlation of .28 with information. 
Men were much better informed and slightly more certain than women. 
Graduate students were least inclined to assert positive certainty; fresh- 
men were most poorly informed. 

Velecika (859) found no correlation between social attitudes of chil- 
dren expressed in a questionnaire, and observations of their social be- 
havior. These social attitudes were likewise unrelated to intelligence. 

Heber Harper’s extensive questionnaire on international relations pre- 
senting 352 questions on twelve themes, is outstanding because of the 
scholarship behind the statements. Few attitude tests have been prepared 
or criticized by persons of superior competence in the subjectmatter con- 
cerned. Harper (797) compared results from 1,700 students in eighteen 
universities in eight countries. International organization was most popu- 
lar among the students in France and England, least trusted by those of 
Austria. Nationalism was no more prevalent in Germany than in the United 
States. Psychology students were less likely than were students in other 
fields to believe war a necessary institution. Kolstad (814) examined in | 
more detail 500 Harper blanks filled out by American students and found 
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liberal international attitudes correlated with: intelligence (.19); mascu- 
linity; a major in history or educational administration rather than one in 
nursing, physical education, or household arts; more advanced degrees; 
graduation before 1918; travel abroad; independent political affiliation: 
residence in the West North Central region rather than in the South At- 
lantic; and affiliation with Congregationalists rather than Lutherans. Droba 
presented his study of militarist-pacifist attitudes in several articles in 
addition to the one included in the bibliography (784). He found pacifist 
attitudes among students at the University of Chicago correlated with: age, 
femininity, social-science study, independent or socialist political affilia- 
tion, liberal Protestantism rather than Lutheranism or Roman Catholicism, 
absence of military service, foreign parentage, and academic scholarship. 
No relation was found between response to his scale and intelligence, edu- 
cation of parents, occupation of father, or nervous symptoms, as reported 
on a questionnaire. 

Race attitudes were studied by Guilford (795) using paired comparison 
of 15 “racial” (really primarily national) groups, in seven universities. 
Students at New York University differed distinctly from those in other 
schools, The nationality of the students’ parents was related to favor for 
that group. Zeligs and Hendrickson (872) used a type of Bogardus social- 
distance scale to study the attitudes of 200 sixth-grade children toward 
39 “races” (again, nationalities). Correlations between attitudes of boys 
and girls, Jewish and non-Jewish, were .85 or above. Acquaintanceship 
made for tolerance in all cases except the negro. Garrison and Burch (791) 
used 35 true-false statements on negro-white relations, chosen from the 
questionnaire prepared by the Social Science Research Council Committee. 
Less than half of the students at North Carolina State College accepted 
such a statement as, “The principles of brotherhood should hold in rela- 
tionships with negroes.” Freshmen were more intolerant than seniors, and 
students from rural communitiese more intolerant than those from urban 
centers. Reckless and Bringen (836) used 40 statements from C. S. John- 
son’s questionnaire on racial opinions to compare with results on 40 
information items. Correlations between information and attitude favorable 
to the negro were .88 in the South, .58 in the North and West. Minard (827) 
found race attitudes fairly well defined by seventh grade, and always far 
below what experts deemed desirable. Intelligence was conducive to better 
race attitudes, but these attitudes showed little relation to sex or to socio- 
economic level within Iowa towns. 

An unhappily large number of studies not reported here dealt simply 
with the attitudes of some college class on a set of opinion statements, or 
compared results among various college classes. In general, college train- 
ing appears to have some liberalizing effect, although the proportion pre- 
ferring Fascism to Bolshevism in Willoughby’s study (867) of Stanford 
University students rose from 60 percent of the freshmen to 75 percent 
of the seniors. Moore and Garrison (829) pointed out that A students 
made 53 percent of the possible radical choices and none of the re- 
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actionary sort, while D students made only 4 percent of the possible radical 
choices and 13 percent of the reactionary ones. Conservatives and re- 
actionaries were relatively more frequent in North Carolina State Col- 
lege, less so in Washington State College, and least in evidence at New 
York University. Stalnaker (850) made a Thurstone scale on attitudes 
toward athletics and reported athletes most favorable, college administra- 
tors least so. 

While no studies dealt directly with the attitudes of owning classes 
versus working classes, Arnett (764), in addition to reporting the general 
conservatism of 1,076 schoolboard members taking the Manly Harper 
Social Belief and Attitude Test, analyzed results by vocational and other 
categories. The most conservative were those in clerical, proprietorial, and 
managerial occupations; the most liberal were the professional workers. 
Age seemed to make relatively little difference except that those from 
sixty-one to sixty-five years of age were markedly more liberal than those 
younger or older. Liberalism increased generally for those with more than 
thirteen years of schooling. Independents in politics, were, of course, more 
liberal than Democrats, and much more liberal than Republicans. Sex dif- 
ferences, sectional differences, and differences due to size of community 
were slight, except that those from cities over 50,000 appeared more lib- 
eral. Uhrbrock (857) constructed (Thurstone technic) a scale of atti- 
tudes toward the company, and administered it to 4,430 employees under 
conditions assuring them that their frank and anonymous opinion was de- 
sired. Most favorable were the foremen, 86 percent of whom exceeded the 
mean of the factory workers. Next came the office clerks, 77 percent of 
whom were better satisfied than was the average factory hand. Correlation 
between attitude toward the company and information about the company 
was —.01; correlation of attitude with intelligence was —.14. 

Grauer (794) tested job satisfaction, mechanical performance, and rate 
among sewing-machine operators. Attitudes toward the tests were rated by 
examiners. Earnings correlated .67 with the performance tests, .42 with 
favorable attitudes toward the tests, and .33 with job satisfaction. Hall 
(796) gave a questionnaire designed to give measures of occupational 
morale, attitude toward employers, and attitude toward religion, to 360 
unemployed and 300 employed engineers. Employed men were more am- 
bitious and favorable in their attitudes toward engineering as a profes- 
sion, and were also more appreciative of employers. Favor toward em- 
ployers among the employed men increased with age. Bitterness toward 
both job and employer increased with increasing period of unemployment 
and with financial desperation of the. unemployed men. There were clear 
differences between the employed engineers who felt secure in their jobs 
and those who feared a lay-off, the latter approximating the attitude of 
the unemployed. Kornhauser and Sharp (815) used questionnaires and 
interviews to obtain attitudes regarding work from 200 factory girls. Two 
departments, identical except for having different forewomen, showed in 
one case 71 percent of the girls favorable toward their job, in the other 
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case only 29 percent. Sixteen of the 25 most neurotic were dissatisfied: 
only 3 of the best adjusted emotionally were similarly dissatisfied. Equal 
pay for all was favored by 75 percent of those with low efficiency records 
and by 36 percent of those with high efficiency records. 

Interesting studies have been made of differences in attitude among 
racial and national groups. Gardner (790) found negroes no more dif. 
ferent from whites in association tests than two white groups were from 
each other. Shores (846) found correlations of .82 between the non-fiction 
reading interests of negro students at Fisk University and white students 
at the University of Chicago. Sex differences were greater than race 
differences. Pressey and Pressey (835) gave his test of emotional atti- 
tudes and interests to nearly 2,000 white children and an equal number 
of Indian children above fifth grade. By white norms, the Indians showed 
retardation of two to five years in emotional age. Chant and Freedman 
(777) reported Canadian and Chicago students alike in nationality prefer. 
ences. Thomson (856) showed a picture designed to arouse sympathy, to 
about 50 each of German, Esthonian, and Russian young people. Subjects 
reacted by reporting emotions felt, by drawing lines unconsciously, by 
giving color reactions, by telling what action they would take, etc. The 
most expressive (perhaps partly due to the conduct of the experiment in 
the German language) were Germans, next Russians, least Esthonians. 
Boys were more expressive than girls. Kohler (813) reported cultural dif- 
ferences in his statement that attitude scales would be little used in Ger- 
many because Germans objected to being questioned about such per- 
sonal matters. Madden and Hollingworth (822) found white judges and 
Chinese judges to use different standards in judging the beauty of white 
adolescents. 

Lewerenz (820) gave his Questionnaire Orientation Test, which is con- 
cerned with opinions about health, education, leisure, home life, culture, 
etc., to juvenile delinquents, adult prisoners, policemen, and superior 
adults, finding the average scores to increase in the order named, with the 
last group much superior to the other three. Hendrickson (799) compared 
87 sophomores in a teachers college with a group of experienced women 
teachers and found the latter to be: older, less interested in dances and 
other social activities, less interested in light entertainment, less inclined 
to physical exercise, more apt to go to concerts or to visit art galleries, 
and more interested in stocks and bonds. 

Attitudes differ not only among racial, national, and occupational groups, 
but change over a period of time within any given group or section of 
society. Bain (765) found that the mental hygiene approach had so per- 
meated popular thought that incoming students in 1932 agreed much more 
closely with Wickman’s mental hygienists than did corresponding groups 
in 1927. Acheson found deans of women reporting considerable liberali- 
zation in their attitudes. Montelli (828) used the technic of asking, “What 
would you do if you had a magic cap?” to show how much more prac- 
tical, concrete, and social are the interests of Russian youth today than 





were the sentiments expressed before the revolution. Schank (840) 
analyzed two changes in rural community attitudes, one in which vested 
interests developed sentiment for a particular site for a consolidated 
school, and another in which private attitudes toward baptism and card 
playing triumphed over the previously maintained publicly acceptable 
stand. 


Moral Values and Religious Attitudes 


Among the twenty-five studies of moral and religious attitudes, the most 
numerous were Dudycha’s detailed reports (787) on student beliefs in 
religious doctrines, superstitions, social and moral problems, evolution, 
etc. Binnewies (770) tested religious beliefs before and after eight lec- 
tures which succeeded in bringing about a marked shift toward liberal- 
ism. Betts, Clem and Smith, Durea and Simpson used the old technic of 
getting names of types of offenses ranked in order of badness, with the 
outcome no more illuminating than has usually been the case. 

Since Wickman’s original study of the seriousness with which teachers 
as contrasted with mental hygienists rated various offenses, several others 
have used a similar technic: Stogdill (853) compared ratings by parents, 
students, and mental hygienists. Watson (863) showed that the mental 
hygienists did not agree among themselves any better than the teachers or 
parents agreed with them, that the offenses were stated too vaguely and 
generally to make possible any reliable rating or valid interpretation, and 
that the questions given to the two groups were not really comparable. 
He argued that such studies showed only that either group, both groups, 
or neither group may be right. Mathews (825) asked 500 students and 
50 faculty members which of several described cheating incidents they 
could justify. Laxer standards appeared to prevail among students, men 
and upper classes. Schneckenburger (842) compared attitudes of children 
from proletarian homes with attitudes of children of similar age from 
middleclass or bourgeois background. No differences appeared in re- 
actions to pictures of boys (a) attacking a girl or (b) sticking a sleep- 
ing grandfather with a pin; but on (c) a picture of a thug clubbing a 
rich man, the testimony (oral) of the proletarian children showed 28 
percent approving and 15 percent excusing, while among the bourgeois 
children only 10 percent approved and none suggested excuses. Schme- 
ing’s questionnaire (841) on negative ideals among 1,000 school children 
showed the importance of studying not only what children seek, but equally 
what they wish to avoid, e. g., contempt of their fellows, unemployment, 
or illness. 

A new approach to the problem of moral knowledge, and one which 
might well be adapted to other problems, was worked out by Yates (871). 
He analyzed 150 suits for slander and developed the accepted principles. 
On the basis of these principles he constructed a questionnaire for children, 
revealing the areas in which they needed instruction. 
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Gilliland (792) used 40 statements supposed to test superstition after 
a psychology course and found an average reduction from about ten er- 
rors to six errors. Whitelaw and Laslett (865) gave the Nixon list of 
superstition statements to their classes. Caldwell and Lundeen (773) re. 
ported extensive studies of superstition among secondary-school pupils. 

The Allport-Vernon Scale of Values, based on the six types suggested 
by Spranger, was further tried out by Cantril and Allport (774) on 2,755 
persons. Reliabilities appear satisfactory except for “social” value. 
Graphologists’ rating for esthetic value correlated .4 with the test indica. 
tion; ratings on economic value (.3), and theoretic (.25) were also in- 
dicative of some relationship, but the other lines measured showed no 
agreement. Pintner (834) found students majority in mental testing rating 
higher on esthetic, lower on political, and in the case of the men, higher on 
theoretical, values than were the A-V norms. Men in administration ap. 
peared low on esthetic value. Intelligence correlated .4 with social values, 
.2 with theoretical values, and —.4 with economic values. Class marks were 
similarly related positively to social values and negatively to economic 
values. Self-ratings varied from .68 for religious values to —.02 for 
political values. Attitude toward the church (Thurstone scale) correlated 
—.78 with religious values as measured on the Allport-Vernon Scale. 

The most comprehensive analysis of religious attitudes, published dur- 
ing the period of this review is Woodward’s attempt (869) to correlate 
adult religious views with childhood emotional development. Conservative 
religious attitudes were found among those who had grown up in religious 
homes, had experienced strong sense of guilt and sex shame, had not re- 
belled against home discipline, had experienced cooperation and com- 
panionship in the family, and had been dependent on parents. Vetter and 
Green (860) found members of the Association for the Advancement of 
Atheism typically to have grown up ina strictly religious home, to have 
left the church during adolescence, to have been unhappy in childhood. 
to have lived in cities over 25,000 population, and to attribute their 
atheism to wide reading and disgust with religious hypocrisy. 

Donnelly (782) developed a test for high-school students to measure 
faith in God. Part I was a vocabulary test to make sure of adequate under- 
standing. Part II contained twelve items concerning conduct, with oppor- 
tunity to indicate how much conduct would probably be influenced by 
faith in God. Part III consisted of statements showing tendency to trust in 
God. These were scaled by 120 judges. Part IV consisted of statements 0! 
belief about God, more intellectual in form. Reliabilities were .68, .72, .4°. 
and .83 for each of the four parts; .88 for the total test. A study of Jewish 
students was made by Nathan (830). About 60 percent reported impersonal 
concepts (including naturalism, skepticism, and agnosticism) of God, while 
33 percent more nearly approached the orthodox personal concept. The 
proportion in these groups was unrelated to extent or witype of religious in- 
struction previously given. 
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Interests: General and Vocational 


Katz, Allport, and Jenness (811) asked over 4,000 college students their 
attitudes toward study, social life, fraternities, instructors, religion, music, 
athletics, academic freedom, personal problems, cheating, and many other 
matters. The results have been published, but hardly lend themselves to 
summary. Sheldon (844) reported on vocational, recreational, service, and 
home interest activities of 1,087 boys and girls in the upper grades. Hildreth 
(800) studied preferences of over 200 adolescents for games, companions, 
school activities, reading, and vocations. Interests of Chinese youths were 
studied by Stowe (854), Chang (776), and Webster (864). Denisov (781) 
studied interests of upper-grade children in the Soviet Union and Antipoff 
(762) asked ten “preference” questions of 760 pupils ten to fourteen years 
of age in Bello Horizonte, Brazil. Drake (783) tried out an interest test and 
the Pressey X-O in a fruitless endeavor to improve prediction of college 
scholarship. Thomas (855) compared interests of public school and de- 
tention school boys, finding delinquents more likely to read Hearst news- 
papers, less likely to use libraries, more frequent in movie attendance, less 
apt to have a radio at home, but differing little in preferences as to games 
and active recreation. Reading interests were studied in the Soviet Union 
by Ivanov (807) and among 25 negro women by Hudson (803). Witty 
and Lehman (868) found rural children making more collections than did 
city children. Jones and Conrad (809) reported that country people liked 
action movies, especially “Westerns.” 


Fryer (788) has published a thorough review of interest studies, and 
also a suggestion (789) of five ways (control samples, selection of samples, 
group differences, extraneous criteria, and use) of validating interest 
measures. Schuwerack (843) used the device of asking young adolescents 
what they would do if they should suddenly become rich. Investment with 
a view to practical returns, provision of security for loved ones, and for- 
eign travel were the majority types of response. Hildreth, Illge, Jones, and 
Wallace have published questionnaires to inventory interests. 

Hurlock and Jansing (804) asked 1,132 boys and girls, age fourteen to 
eighteen, negro and white, concerning the vocation they would like most 
to follow, the one they are most likely to follow, and the vocation their 
parents would wish. Negro boys chose teacher, civil service, and machinist; 
white boys chose aviator and engineer. Negro girls chose teaching almost 
exclusively, while for white girls teaching rated below business. Over 
three-fourths of the pupils chose occupations other than those of their 
parents. Vampa (858) reported on the use of a questionnaire together with 
tests of attention and memory, as a means for constructing profiles which 
aided in the vocational guidance of 120 Italian boys. Lehman and Witty 
(819) found pupils most apt to choose the occupation they had rated as 
a good money-maker, next to choose the one they thought of as most 
respected, and least apt to choose the one they thought of as easiest. _ 
Proportions choosing by money and esteem were greatest at age twelve; 








those choosing by ease were the youngest group studied (age eight). From 
their study of play interests, cited in the previous Review of Educational 
Research, Lehman and Witty compared vocational interests at various ages 
and showed that such interests cannot be permanent. Thus boys choosing 
certain types of engineering ranged from 3 percent at ages eight and nine 
to 23 percent at about age sixteen. Maller (823) studied the Edison scholar. 
ship candidates, as compared with other high-school students, finding 
the Edison scholarship boys: more apt to come from homes of workers, 
farmers, and professional men, less apt to come from commercial occupa- 
tions; more apt to have scientists in the family; more apt to own scien- 
tific apparatus; more apt to intend going to college; just like the others 
in the hours given to sleep, work, and recreation; more apt to choose 
engineering as a vocation, swimming as recreation, and experimentation as 
a hobby. The Edison scholars were almost twice as apt to believe present 
relations between capital and labor satisfactory (86 percent to 37 percent 
and 51 percent in two control groups). 

Goodfellow (793) used the Strong Vocational Interest Test with prospec. 
tive teachers and compared the A (possessing interests of successful teach- 
ers) and the C (not possessing such interests) groups. The A group were 
superior in academic average. The 18 A women were less introverted (Col- 
gate), less emotional (Thurstone), more ascendant (Allport). Personality 
differences with the 12 men were not reliable. 


Other New Tests 


The following may be added to scales and tests mentioned in connec: 
tion with specific projects: 


Beyle (768): scale for measuring attitude toward candidates for elective govern- 
mental office. 


Beyle and Parratt (767): scale for measuring severity of the third degree. 
Israeli (806) : measurement of judgments about the future. 
Nystrom (831): measurement of Filipino attitudes toward America. 


Remmers, Brandenburg, and Gillespie (838): measurement of attitudes toward the 
high school. 


H. N. Smith (848): scale for measuring attitudes toward Prohibition. 


Stauter and Hunting (851) tried out a test of social contacts based upon 
questionnaire asking for the number of persons in each of 191 categories 
(e. g., baseball fans, art students, Methodists, Frenchmen, etc.) who are 
known by name and who know the subject by name. Correlations with 
intelligence, psychology grades, and ability to associate pictured faces 
with names, were all low. Those mentioned in the college newspaper scored 
higher than those never so mentioned. 

Stevick (852) tested conformity by asking subjects to encircle first what 
they believed, later what they thought most people believed. The more 
schooled were slightly more independent in belief (r = .25). 

Katz and Braly (810) asked 100 students to select from a list of 84 
adjectives the five which the students considered most characteristic of 


270 





each of ten national or racial groups. In the case of negroes, 50 percent of 
the 500 votes cast were confined to five traits. The Chinese and Turks re- 
quired twelve and sixteen traits respectively to include 50 percent of the 
poll, indicating less of the adjective-stereotype. 

Israeli (805) found it interesting to ask students to predict divorce rates 
for years to come, and to give the most probable date for the decline of the 
West as described by Spengler. 


Contributions to Technic 


In his review of 125 titles on measuring attitudes, Droba (785) classified 
methods as: absolute ranking, case method, relative ranking, graphic rat- 
ing, paired comparison, and the scale of equal-appearing intervals. 

It has become accepted as good technic in making attitude scales to use 
Thurstone’s method of equal-appearing intervals. Statements are sorted 
into piles (usually eleven) by judges. Statements which appear consistently 
in a given position may be chosen as representative of that degree of the 
attitude. The scale value of each statement is calculated. Hinckley (801) 
showed that statements on the negro were given approximately equal scale 
value whether sorted by northern or southern, white or negro judges. 
Wang (862) suggested sixteen criteria for the original selection of state- 
ments. Miller (826) criticized the Thurstone scale, because he found that 
the average person checked items ranging over 7.2:scale units while the 
scale itself had only 10.8 units in its entire range. He held that it is not 
justifiable to stop with the opinions of the original judges, but maintained 
that the actual responses of large voting populations should be used fur- 
ther to refine the scale. 

The Bogardus Social Distance Scale (771) was built after asking 100 
judges to distribute 60 statements indicating degrees of intimacy or social 
relationship into seven piles. Seven equidistant statements of situations 
were thus selected. Lists of 40 races, 30 occupations, and 30 religions have 
been prepared, with suitable instructions. Each is judged by whether people 
of that kind would be admitted to marriage, friendship, work, neighbor- 
hood, acquaintanceship, or citizenship. Zeligs and Hendrickson found 
nearly 90 percent agreement between the scale and personal interviews with 
13 sixth-grade children. M. Smith (849) determined intimacy scale values 
for 16 statements of social distance as judged by 65 college students. 

The question of the number of degrees of acceptance or rejection has 
been often argued and sometimes investigated. Pemberton (832) found the 
+ 3 to — 3 scale the most reliable. Likert (821) in a careful study of 
attitudes on internationalism (24 statements), the negro (15 statements), 
and imperialism (12 statements) used some three point, some five degree, 
and some multiple choice responses. He found that the determination of 
sigma values for each of the five possible answers did not greatly improve 
reliability as compared with the simple method of counting 1 for the 
answer at one extreme, 2 for the next, 3 for the middle answer, then 4, 
and 5 at the far extremes. Reliabilities (split halves) varied from .80 to 


271 








.90, and retest after 30 days gave results just as high. Reliability of the 
Thurstone-Droba War Scale, scored by Likert’s simple 1-5 method proved 
somewhat higher than when the regular Thurstone scoring was used. Com. 


plete directions for constructing an attitude scale are given in this mono. 
graph. 


Comment 


The reviewer ventures to offer the following suggestions for the improve. 
ment of some studies of social attitudes: 


1. Let the study be an outgrowth of activity and thought in connection with major 
social needs. Scientific resources are too limited to be wasted upon the inconsequential. 

2. Let the subjects and situations more frequently be found outside of college 
classrooms. 

3. Let those data be collected which will check some hypothesis—one which has a 
good chance of being both true and significant. The random collection of personal 
data and test scores for correlation seems never to bring rich reward. 

4. Let there be more attempt to find indices of social attitude which arise and 
validate themselves in the activities of the economic, political, and social struggles. 
Psychologists and educators have shown an inexcusable addiction to paper and pencil. 

5. Let the necessary scale be constructed after much more thorough, intimate, and 
discriminating acquaintance with the best scholarship concerning the issue. Many of 
the so-called “radical” positions in existing scales, for example, would be disowned 
by any intelligent radical. 

6. Let the refinements ‘of accuracy in mathematical units wait until the many errors 
which common sense could eliminate have been removed. Statistical attack has too 


often substituted for, rather than followed upon, intelligent and thoughtful analysis of 
the actual behaviors involved. 





CHAPTER VIII 


Measures of Character and Personality Through 
Conduct and Information 


Tue studies for the period have been treated in four large divisions. The 
first includes publications primarily concerned with contributions to 
method. These are discussed under the headings of systematic observation, 
ratings, tests, expressive movement, psychogalvanic measures, physiological 
factors, inventories, and factor analyses. The second division represents 
an alphabetical arrangement of characteristics, traits, or constellations of 
behavior in which the investigator has been interested. The patterning of 
character and personality measures has been reserved for the third section. 
Studies stressing the prediction of achievement and the relationship between 
variables are included here. Investigations of the effect of direct or indirect 
instruction on knowledge or modification of behavior have been discussed 
in the fourth large subdivision. 


CONTRIBUTIONS TO TECHNIC 
Systematic Observation 


Growth in the development of time-sampling technics was described in 
the summary of seventy-six titles by Olson and Cunningham (997). The 
emphasis on overt behavior and the range of problems attacked was illus- 
trated in a list of thirty-eight categories of behavior to which the approach 
had been applied. Practices employed in time-sampling studies were dis- 
cussed under the following headings: 


. Definition of behavior: action criterion, impression criterion, social-stimulus criterion, 
unit of behavior, category 


. Timing and sampling: time-sample, distribution of time-samples 
. Distribution of observer’s attention: individual observation, group observation, 
scanning 


. Method of recording: intermittent recording, continuous recording, photo-recording 
. Method of scoring: time-sample score, frequency score, derived score 
. Conditions of observation. 


Bott (889) has summarized in discussion and tabular form the practices 
in some of the major observational studies as to the selection of categories, 
the quantification of measures, devices for recording, methods for measur- 
ing reliability, and methods of expressing results. 

The development of recording technic has been illustrated in the stilled 
motion pictures of D. S. Thomas, Loomis, and Arrington (1020), the photo- 


sampling of Olson and Wilkinson (995), and the intermittent photography- 
of Swinton (1018). 
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The period has also witnessed an increased advocacy of what might be 
termed anecdotal, diary, or journal types of recording. A single observa- 
tion represents measurement in the sense of presence or absence, and may 
preserve more of the matrix from which the behavior emerges than more 
delimited or controlled observations with established reliability. In general, 
such observations have been employed for their guidance and counseling 
values rather than for measurement, research, or controlled appraisal. 
Notable studies have been made and others are in progress making more 
systematic use of records of incidental material. Space requirements and 
the task of this Review have dictated the elimination of the numerous studies 
pursued by informal methods of observation. 


Ratings 


Rating methods have very definitely grown in favor during the period 
covered by this Review. Weiss (1025) reviewed 131 titles dealing with both 
self-ratings and ratings by others. Her opening paragraph is of interest: 

Whether we like it or not, rating scales are forging their way into research circles. 
Not only are they of value in lieu of more objective measures, but they are being found 
applicable themselves with a relatively high degree of objectivity. The supposed death 
blows first dealt them by Rugg and Thorndike have proven only temporary checks. 


What then seemed unavoidable flaws have now been found remediable through develop- 
ment of new techniques (1025: 185). 


Conrad published a series of articles (906, 907, 908, 909) in which he 
discounted the importance of many of the factors which have been re- 
garded as disadvantages of the rating method. His setting was the nursery 
school where superior conditions for observation exist, as contrasted to 
the casual limited contracts typical of certain work or educational situa- 
tions at later levels. In addition to showing that valid and reliable data 
may be secured, he called attention to a problem which has been some- 
what overlooked in rating. Studies hitherto have tended to emphasize differ- 
ences among traits, among raters, and conditions of rating. By treating the 
trait ratings on each subject as a sample, he is able to show that the re- 
liability of rating is in part a function of the child; i.e., some children can 
be rated reliably, others not so reliably, and a given trait may be rated 
reliably in a child of outstanding characteristics, and unreliably where the 
trait is not particularly applicable to the child. 

The trend from both direct observation and rating suggests that an eye- 
witness record of behavior or a rating judgment based on the impression 
from a series of events may possess more validity than an abstracted ques- 
tionnaire or test instrument from which one must infer a relationship to 
the behavior. The disadvantage of the direct approach is the inconvenience 
of securing the record and the inaccessibility of certain aspects of the ex- 
perience and personality to direct observation. 
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Tests 


Investigators have continued to employ the tests, or modifications of test 


technics, developed by May, Hartshorne, Maller, and others. The Character 
_ Education Inquiry was largely completed and reported in the Review of 
_ Educational Research for June, 1932, on character and personality tests. 
_ The new tests involving performance or information have been reserved for 


topical treatment later in the chapter. Measures involving self-rating or 
reporting have been described in the two preceding chapters. 


Expressive Movements 


An increasing number of attempts are being made to generalize more 
widely on the specific data of peripheral expression so as to regard voice, 
gesture, graphology, and movement as reflections of well-organized dis- 
positions in personality. The descriptive and analytic studies have been 
omitted in conformity with the general purpose of the present Review. 


_ The journal, Character and Personality, is the most prolific source for both 


European and American contributions to the study of expressive movement. 

A survey of the literature and experimental work in the book by Allport 
and Vernon (877) constitutes a useful beginning for students of the prob- 
lem. Part A treats the problem of the consistency of individuals in respect 


_ to their style of expression and their habits of gesture. Experiments were 
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conducted to measure speed, tension, extent, variability, pressure, and 
other peculiarities of natural movement. Intercorrelations were presented 
between a variety of measures in search for group factors. A need was 
shown for the interpretation of certain types of data in terms of psychologi- 
cal equivalents, for there was some evidence that “measures which do not 
correspond statistically may nevertheless be congruent psychologically.” 
Part B reviews critically recent experimental work in graphology and 
presents two experiments designed to test the skill of graphologists and 
laymen. The writers have a feeling that perhaps too much has been claimed 
for graphology by graphologists and too little by psychologists. Both ges- 
ture and handwriting reflect an essentially stable and constant individual 
style. Experimental material on handwriting has been brought into closer 
relationship with personality traits (899, 943), or given special study from 
the point of view of the reliability of interpretation of the material itself. 

In general, studies of body type have been excluded from the present 
Review. From the point of view of expressive movement, however, the 
traces left in the face by habitual expression, typical posture, and methods 
of walking and running must be regarded as evidences which can be tested, 
even though the validity for personality study is not always clear or 
demonstrated. 

Blake (888) studied bodily expression, excluding the face except as 
added information for the interpretation of the data. Subjects were supplied 
with five sheets with nine pictures on each sheet. The sheets represented - 
the human figure in various poses, and with the figure divided so that ex- 
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pressive movements by head and shoulders, torso and arms, feet, legs, and 
hips, full figure without face, and full figure with face could be studied 
separately. For each sheet, the subject was required to give nine responses 
as to whether the figure showed horror, frenzied anger, feebleness, tender- 
ness, antagonism, stealth, egotism, embarrassment, or resignation. He con- 
cluded that one tends to interpret certain bodily expressions as indicative 
of certain dominating mental or emotional states, and that the ease of so 
doing increases as the number of bodily agents involved increases. Further, 
he contended that there is an improvement with training, and that adults 
exceed children in the ability. Blake pointed out that, regardless of origin, 
bodily expressions constitute a part of the problem of human relations 
and may be developed or modified consciously for these ends. 

Further evidence of the importance of expressive movement is found in 
the study by Goodenough (934) of the judgment of emotional states in 
infancy. Eight photographs representing infantile emotions were submitted 
to 68 students in child training courses, together with descriptions of the 
states. The students were required to match the pictures with the descrip- 
tions. Correct judgments were made in 47.4 percent of the cases, which 
is 5.7 times the number expected by chance. It is possible that such emo- 
tional expression may be more readily observed before the age of learned 
inhibitions and substitutions. 


Psychogalvanic Studies 


The psychogalvanometer has played and is playing an important role 
in the literature of personality studies. Landis (963) reviewed 247 titles 
covering the literature from 1929 to May, 1932. It is clear from the review 
that persons interested in applied uses cannot expect much immediate help 
from these instruments. The interest and value of the technic from the 
point of view of research are obvious. Landis’ conclusion for the time 
being is of interest: 

From this literature, giving the results and conclusions of many investigators, the 
reviewer is convinced that there is really no adequate evidence that these electrical 
phenomena of the skin are of necessity associated with any psychological event. They 
are, as Wang pointed out, strictly physiological in nature and probably have a marked 
and important psychobiological significance. There is really no justification for anyone 
using any present galvanometric technique or method as a measure of, or a criterion 


of any of the traditional psychological categories, personality traits, or social rela- 
tionships of the individual (963: 275). 


Many investigators have found, as in the case of Darrow (968: 57-261), 
that the correlations between various questionnaires concerning emotional- 
ity and the psychogalvanic responses often tend to be sufficiently large to 
indicate a group trend, even though not significant for individual prediction. 


Physiological Studies 


Investigations into personality by physical and physiological technic 
were reviewed by Larson and Haney (966). They submitted a lengthy 
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bibliography and described some of their own work by means of the poly- 
graph technic (see later section on lying). By using a falling chair device, 
Ray (1007) demonstrated that real changes appear in the pulse rate, 
respiration, and inspiration-expiration ratio of children. Investigators inter- 
ested in body chemistry as indexes to personality should examine the review 
by Rich (1008). Acid-base equilibrium and creatinine production appeared 
to be definitely related to emotional excitability, although the explanatory 
interpretation is not yet clear. Experiments dealing with calcium and phos- 
phorus have usually been equivocal or negative. Children showing behavior 
problems appeared to be somewhat more likely to show endocrine disorders 
according to the investigations of Rowe (1011). 

Reference should also be made here to the attempt to relate food and 


character (885) and to the compilation of references to psycho-dietetics 
by Fritz (928). 


Inventories “ 


Inventories furnish a point of departure for the construction of various 
types of instruments for the study of character and personality. Useful 
ones have been prepared by Ackerson (873, 874) on the basis of behavior 
problems of children, by Conrad (905) for behavior ratings of nursery 
school children; by Krout (962) for gestures; and by Allport and Vernon 
(877) for expressive movements. Baumgarten (882) listed an inventory 
of character traits with 1,629 terms in connection with her study of char- 
acter traits. The social setting of personality problems was emphasized 
in Cantril’s review (900) of 306 titles which cover such topics as fads 
and fashions, conversation, humor and laughter, imitation, suggestion, 
creeds, revival meetings, legends, patriotism, gossip and rumor, religious 
cures, friendships, leadership, customs, clothing, newspapers, radio, race 
attitudes, language, revolution, and industry. 


Factor Analysis 


Thurstone (1021) reported his work on multiple factors in his presi- 
dential address before the American Psychological Association in Septem- 
ber, 1933. He pointed out the inadequacies of Spearman’s two-factor or 
single-factor method in accounting for the multi-dimensionality of mental 
traits. As illustrative material, he employed a number of specific studies 
of personality—one on a list of sixty adjectives descriptive of personality, 
which yielded five group factors; one on the insanities, with the whole 
range of psychotic symptoms reduced to five clusters; another on the voca- 
tional interests of college students; and a fourth on radicalism as a com- 
mon factor. Spearman has published requests in several countries for 
cooperative work in personality investigation. Studies under the plan have 
been begun in the United States. 

Maller (980) has studied the intercorrelations of four growth scores . 
for honesty, cooperation, inhibitions, and persistence, contained in a study 
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by Hartshorne, May, and Maller. They are uniformly positive and suggest 
the presence of a general factor in character. The writer believes that this 
general factor is the readiness to forego an immediate gain for the sake of 
a remote or later gain. 

Jenkins and Ackerson (956) described card-sorting methods for track- 
ing down types of clusters of individuals with characters in common. The 
method is applicable only to large bodies of data. As applied to behavior 
problems of children, it is possible to secure clusters of items in which 
the obtained frequency of association exceeds the chance expectancy. 


CHARACTERISTICS, TRAITS, AND CONSTELLATIONS OF BEHAVIOR 
ASCENDANCE-SUBMISSION 


Self-report methods of describing ascendance-submission, extroversion- 
introversion, domination-compliance, or similar characteristics are in- 
cluded in another chapter of the Review. Jack (955) investigated ascendant 
behavior in four-year-old children by means of an experiment made up of 
a series of pairings. Factors differentiating between the upper and lower 
third were studied. 

By comparing the actual performance of sixteen nursery-school children 
in six test situations with their self-assurance in these situations, Emmons 
(920) found that self-assurance was positively correlated with skill, age, 
and intelligence. 


Delinquency 


Several new instruments for the measurement of delinquent tendencies 
have appeared and a much larger group of studies describe the application 
of earlier tests and ratings to groups of children differentiated as problem 
and non-problem or delinquent and non-delinquent. Loofbourow and Keys 
(973} published a battery of four tests known as the Personal Index. The 
battery was based upon a study (974) in which ten group tests of behavior 
tendencies were applied to reformatory inmates, to groups of junior high- 
school boys designated as disciplinary problems, and to public school boys 
of like age and intelligence. Burrow (894) developed a social rating sheet 
containing twenty-five desirable attitudes and traits for the study of be- 
havior problems among backward pupils in a special school. 

The research literature developing about the use of the Haggerty-Olson- 
Wickman Behavior Rating Schedules has been summarized to April, 1933 
(998). Statistical inquiries and clinical uses in schools, courts, and guid- 
ance clinics are described. It seems clear that the schedules have validity 
for the prediction of the constellations of behavior which bring children 
into conflict with the mores of social groups and with the machinery of 
the juvenile court. 

Delinquents have been found to have a higher social participation rating 
than non-delinquents (879), and to secure markedly higher average scores 
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on the Sims Scaie (1019). Holsopple (953) suggested that recidivism may 
be related to an inability to inhibit or unlearn behavior habits. There is some 
evidence for a small group that mirror drawing is an index to this ability. 

The studies contrasting delinquent and non-delinquent children furnish 
interesting material on the general question of the validity of personal 
report methods as contrasted to rating, and as to the relationship between 
knowledge and conduct. Using the personal report method, Babcock (880) 
failed to find a differentiation using Attitude SA Test, Sweet Test of Per- 
sonal Attitudes in Young Boys, Roger’s Test of Personality Adjustment for 
Boys, and various perseveration tests of Stevenson. Mira (987) concluded, 
on the basis of tests involving information, ethical discrimination, and 
conduct, that: “These data demonstrate that the correlation between the 
tests of theoretical conduct is small; that between theoretical and real 
conduct is nothing; and that between the tests of actual conduct in face 
of situations which require the possession of the same moral characteristics 
but expressed in two different forms (action in one case and inhibition in 
the other) is highly satisfactory.” 

The differential capacity of twenty-four different tests was studied by 
Casselberry (901). The index finally obtained apparently is predictive of 
success on parole. He found the Laslett-Casselberry Free Association Test 
one of the most discriminating items in the battery. However, Gilliland and 
Eberhart (932) did not find such clear-cut differences on the Laslett test 
in comparing four groups representing different degrees of delinquency 
in the Chicago area. They surmised that there may be considerable differ- 
ences in the vocabulary of delinquents in different parts of the country, 
which may interfere with the comparability of the results. 

Hill (951) measured the extent of cheating by a technic similar to that 
used by Hartshorne and May. Delinquent children in a reformatory were 
differentiated from a problem group of junior high-school boys and from 
a group selected as well adjusted. There was little difference, however, 
between the two high-school groups. 

Inconsistencies in the differential capacities of the same test in different 
studies are probably attributable to variations in the extent to which age, 
sex, and mental ability have been controlled. In certain instances, a spurious 
differential capacity has undoubtedly been attributed to certain tests by 
basing the conclusions on the criterion group which was employed in its 
construction. In such a comparison, a newly constructed test will show 
up favorably as compared to others, and may fail to hold up equally well 
in new applications. 


Character and Personality Scales 


Many of the rating devices intended for use in the home and school are 
organized in batteries designed to give a many-sided description of per- 
sonality. They cannot, conveniently, be treated under a topical organization. 

Hayes (945, 946) published a scale for evaluating the school beha- 
vior of children ten to fifteen. Raters are asked to check a series of state- 
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ments concerning each child. The statements are concerned with relations 
to others, to the rights of others, to teachers, to other pupils, initiative, 
health habits, general interest, scholarship, and study habits. The items are 
scored in terms of weights based upon the judgment of experts as to their 
desirability or undesirability for character or personality development. 
The scores are then translated into percentile ranks by divisions and a 
profile constructed. The same principle was employed in a variation of the 
scale having an age range of nine to fifteen and designed particularly for 
parents (944). An extension was later developed by Hicks (949) for use 
by parents of children six to nine. 

Maller (979) developed a rating scale including fifty aspects of char- 
acter and personality. Each is followed by a brief description divided into 
three sets—low, average, and high. The actual rating is done on the record 
blank which accompanies the scale. Williams (1026) analyzed essay 
reports by teachers as a basis for the construction of a rating form for 
pupil adjustment in a laboratory school. 


Developmental Age 


Furfey’s previous test for developmental age has been revised to simplify 
scoring and increase reliability (929). The test consists of 196 pairs of 
items on things to do, things to have, books to read, etc. The subject chooses 
one of each pair. Developmental age as determined by shift in choices does 
not appear to increase after sixteen. Developmental age equivalents for 
scores are reported. Reliability of the revised test ranges from .85 to .96 
in the various age groupings. 

Measures of developmental age are now available for girls. Plechaty 
(1003) reported a preliminary form of an objective scale for measuring 
developmental age in grade-school girls. Sullivan (1017) described a 
scale based on characteristics and changing interests of girls from seven 
to eighteen years of age. The method of choice of paired comparisons was 
again utilized. Developmental age increased steadily to the sixteenth year 
in girls, with no abrupt change at puberty. Changes are not so regular 
after the sixteenth year. 


Emotions 


The amount of tension in the muscles of the used and unused hands, 
while making responses to pictures and while tapping, was shown to be 
related to ratings on excitability and to school adjustment in the investiga- 
tions by Duffy (915, 916, 917). Tension in the hand muscles was measured 
by a dynamograph. Such a measure may be useful in determining which 
individual frequently manifests a highly aroused state. By combination 
of time-sampling and graphic rating, Lee (971) concluded that instability 
of mood and mood level were measurable characteristics of nursery-schoo! 
children. Goodenough (933) quantified data on the frequency, duration, 
causes, and methods of handling anger outbursts of children in the home. 
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Eating Behavior 


Eliot (919) studied the personality pattern which differentiated finicky 
and non-fiinicky eaters in nursery schools. A rating scale of thirty-one items 
was developed in which trait names were used but in which the quality to be 
rated was described in terms of habits of action. Thirteen out of thirty-one 
traits were found from which reliable distinctions were obtained. Finicky 
eaters among two- or three-year-old children seemed to have poorer general 
health, to be more emotional, self-assertive and self-expressive, and less 
well adjusted and happy than non-finicky eaters. 


Friendship and Quarreling 


Friendship and quarrels have been a popular topic for investigation 
by students of preschool children. The usual approaches involve variations 
of time-sampling and systematic recording which define friendship and 
companionship on the basis of propinquity and frequency of associations. 
The measured data have been interpreted in the light of sex differences, 
age differences, and other measurable characteristics of the children, see 
Challman (902), Green (936, 937), and Hagman (939). Mengert (984) 
secured measures of friendliness and unfriendliness in a group of two-year- 
old children by pairing each child with each of the other children and 
noting their reactions. 

With older children, Flemming (925) demonstrated a similarity between 
the scores of students and best friends on various measures of personality, 
intelligence, and social status. 


Honesty 


G. F. Miller (986) intentionally misscored the test papers for two groups 
of college students. The students were then given an opportunity to correct 
their papers with a key. One-fourth of the total papers had been scored 
too high by the instructor and another one-fourth too low. He was thus 
able to note the extent to which students would correct errors which raised 
their scores as compared to errors which lowered their standing. The per- 
cent of honesty was 7.7 for a younger group of students and 58.3 for a 
more mature group. Tuttle (1024) obtained a measure of honesty in terms 
of changed answers in a well-motivated school contest involving over 2,000 
children in the elementary grades of thirteen different schools. The results 
indicated a high correlation between honesty and intelligence and an in- 
crease in honesty from grade to grade. Data by schools are construed as 
indicative of a relationship to geographical area. E. H. Moore (989) 
described a method for measuring honesty in classroom performance. 

The Presseys (1004) compared the honesty of Indian and white children 
in the third, fourth, and fifth grades as measured by the scoring of an 
arithmetic paper, handling money, performance with the eyes closed, and 
individual records of results on the hand dynamometer. In general, Indian 
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children appeared to be less honest than white and there was a decrease 
of dishonesty with age. 

General interest in the detection of crime makes the monograph by 
Larson and others (967) on the detection of lying particularly timely. 
Questions are given to the subject to which he must respond by “Yes” or 
“No.” The inference as to lying or guilty knowledge is made from a poly. 
graph record involving pulse rate, blood pressure, respiratory rate, and 
the psychogalvanic reflex. The instrument employed was based upon earlier 
devices with improvements by Larson and Darrow. At times the record 
has been used to obtain a confession. Devices of this type are being brought 
into general use over a wide area, and their validity and reliability have 
probably been exaggerated in popular thinking. The sensitiveness of the 
instrument to a variety of physical and physiological changes makes it 
desirable to restrict its interpretation to skilled persons. 


Humor 


Humor scores were obtained by Landis and Ross (964) by asking subjects 
to rate a carefully selected list of 100 jokes as to excellence. No significant 
relation was found between the humor score and intelligence or introversion. 


Leadership and Popularity 


The time-sampling method was employed by Parten (999) in a study 
of leadership among preschool children. Even at the preschool level there 
appear two types of leaders—the diplomat and the bully. The former con- 
trols a large number of children by indirect suggestion; the latter employs 
force. 

Koch (960) used the method of paired comparisons applied to expres- 
sions of preference by members of a nursery-school group of children to 
obtain a popularity score for each child. These scores correlated .76 with 
the ranking made by the teacher. Popularity scores tended to be higher 
for girls than for boys and positive correlations are shown between the 
measure and compliance with routine, respect for property rights, a ten- 
dency to ask for commendation, and a tendency to tattle. 

Leadership among adolescent boys has been studied by Partridge (1002) 
in the summer camp and scout troops by means of a five-point man-to-man 
rating scale. Leaders were found to excel their fellows in all measures 
of mental and physical traits but did not tend to fall into definite types. 
The bibliography of 143 titles is of value to the investigator. Jenny’s study 
(957), using tests and attitudes scales, indicated that in the summer camp 
the most acceptable boys were well adjusted, resourceful, and capable of 
leadership. Non-acceptable boys tend to be problem cases. 

In Garrison’s investigation (931) students were given instructions to 
list in order the five individuals of the class, boys or girls, that they admired 
most. A leadership score was obtained by means of weighting positions 
held in high school by each senior during the past three years. Marked 


282 








6 


a 
¢ 
e 
C 
I 
| 
I 
( 


— WF — 








Cn MAME Tate 


correlations were found between the admiration score and the leadership 
score and significant relationships between scholarship and leadership 
scores. A small correlation was found between the father’s occupational 
rating and the leadership score. 

Flemming (926) obtained pleasing personality ratings from classmates. 
These ratings were then correlated with other measures and ratings of 
traits obtained from the teachers. Pleasing personality was found to be 
positively related to intellectual enthusiasm, capacity for independent 
thought and for independent work, industry, persistence, social adapta- 
bility, rejective ability, dependability, self-control, and good manners. 

Cowley (910) gave 12 psychological tests to 112 subjects consisting 
of criminal leaders and followers, non-commissioned officers and privates 
in the United States Army, and student leaders and followers. In general, 
leaders rated themselves higher in self-confidence. They scored higher in 
motor impulsion and took an appreciably shorter time to: (a) determine 
whether their decisions would stand; (b) arrange a set of mottoes about 
tact; (c) call out the length of lines on a pack of 70 cards; and (d) arrange 
a set of mottoes about aggressiveness. 


Negativism 


The resistant, acquiescent, and aggressive behavior of thirty-six nursery- 
school children toward other persons was measured by controlled observa- 
tion of activities, by stenographic reports of language, and by records of 
each child’s behavior during intelligence tests (897). 

A test for susceptibility to majority opinion was devised and presented 
by Barry (881) on the basis of earlier work by Moore. Individuals tend 
to change previous judgment to conform to the majority opinion. Striking 
individual differences were found in susceptibility. “S” was designated as 
a measure of negativism and compliance. Persons with low scores in “S” 
tended to be critical, derogatory, and irritable, and persons who were them- 
selves irritable tended to rate others in a similar fashion. 


“Only” Children 


Theory has tended to emphasize the importance of being an only child 
as a personality determiner. Campbell (898), in summarizing 75 titles 
on the problem, concluded that research, both clinical and non-clinical, 
gives less and less support to this viewpoint. More analytical and well- 
controlled investigations are needed to establish the significance of “onli- 
ness” as such (also see discussion in Chapter VI). Witty (1027) studied 
153 only children five years of age. Comparisons were made between 
ratings and measurements of this group and various control groups. Only 
children show themselves superior to other children in health, physical 
development, intelligence, and character traits. 








Play 


Play furnishes a rich setting for observational and experimental studies 
of character and personality. There is apparently no tendency to study 
play as a continum of personality in itself. Factors which make for par. 
ticipation or lack of it have been investigated. Hurlock’s review (954) 
of 128 titles of experimental investigations of childhood play are the 
readiest source for the student. 


Recklessness 


Burtt and Frey (895) have developed a battery of tests for the measure. 
ment of recklessness. The tests involve such things as balancing a long rod. 
putting nuts on machine screws, and filling graduates with water to a 
designated mark. The criterion with an estimated reliability of .86 was 
obtained by means of a graphic rating scale. The items of the test, proper!) 
weighted, correlated .60 with the rating. Factor analysis of the inter-corre- 
lations suggests that the principal element is one of haste. 


Social and Ethical Information 


Several instruments for the measurement of knowledge of social stand. 
ards have appeared during the period. That by Tomlin (1022) is applicable 
to children in grades four to eight. Reliability and validity data are sup- 
plied. 

The test of social usage by Strang, Brown, and Stratton (1016) is 
particularly applicable in junior and senior high school. It covers table 
manners, taste in dress and appearance, good manners for the guest and 
host, good form in relation to others in social and play situations, and 
respect for property. Strang (1015) demonstrated that there is an increase 
in knowledge with age and grade, a positive correlation with intelligence. 
and a significant difference in favor of children whose parents belong to 
professional groups. No data have been supplied to validate either of the 
foregoing tests in terms of overt action. 

The Shields’ Moral Judgment Examination (972) yields an age score 
called “age of responsibility.” The test involves vocabulary, comprehen- 
sion in ethical situations, definitions, offense comparisons, sentence con- 
struction, and judgment. 

The ethical concepts and feelings of 20 boys thirteen and fourteen year: 
of age, were given intensive study by Hermsmeier (948). Evaluations made 
by the experimenter on the basis of test situations show a high corres- 
pondence with estimates of parents and teachers. Ethical concepts were 
tested by the use of definitions, the subsumption of anecdotes, under con- 
cepts, and the differentiation of paired examples of ethical qualities. 
Eight other methods were also used to arrive at measures of ethical feeling 
or information. 
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Socio-Economic Factors 


Chapin (903) developed a Social Status Scale, 1933, which is briefer 
and simpler than the Living Room Scale, 1931, while still comparable in 
reliability and validity. An abbreviated form based upon the personal 
report method of the Sims scale was prepared by Wrightstone (1031). 
A scale for social adequacy was developed by McCormick (976, 977) on 
the basis of the type of data sought by social workers. 

A plan for scoring the opportunities provided by parents for the devel- 
opment of money experiences of children was presented by Hanson (941). 


Studiousness 


It would appear that only a few of the many procedures advocated for 
methods of study actually differentiate between superior and inferior 
college students. Using self-ratings, Eurich (921) studied the relationship 
between study habits and achievement. Nine of the characteristics appar- 
ently differentiated well between groups of superior and inferior students. 

Wrenn and McKeown (1028) selected the study habits which differentiate 
best between students equal in intelligence and unequal in scholarship. 
These have been published in their Study Habits Inventory. The items are 
weighted in terms of their differential capacity and yield a total score to 
describe the adequacy of the student’s general work habits. 


Vocalization 


With the development of radio broadcasting, renewed interest has been 
evident in the judgment of personality from the voice. Allport and Cantril 
(876) had listeners judge both physical and personality traits by listening 
to the natural and broadcasting voice of the speaker. They concluded that 
the voice does give. some correct information concerning the outer and 
inner characteristics of personality. The tendency toward stereotypy is 
shown in the fact that uniformity of opinion is somewhat greater than 
accuracy as judged by actual measured data on physical and personal 
traits. There was more consistency and correctness in the judgment of 
inner characteristics than in the judgment of outer characteristics. 

Time-sampling technics applied to the measurement of talkativeness in 
children have demonstrated that reliable measures may be secured (1010). 
The study by Robinson and Conrad differs somewhat from that of Schu- 
bart (1012). In the former, presence or absence of talking in a given time 
period was the unit of measurement, while in the latter a mechanical 
counter was also used so as to secure a measure of total output and rate. 

Brackett (891) found that the laughing and crying behavior of preschool 
children could be measured reliably by the use of direct observation and 
short-time samples. Laughter appeared to be a highly consistent pattern of 
behavior in varying situations and is somewhat more social than crying. 
Laughing tendencies increased with age in the preschool group while crying 
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decreased. Justin (958) investigated laughter-provoking stimuli over an 
age range. Her method of procedure, involving timing, permitted quantita. 
tive measures with a consequent study of age change in relation to other 
characteristics such as intelligence. Thirty-six references add to the value 
of the study for the investigator. Kenderdine (959) also studied laughter 
in the preschool child. Ding and Jersild (914) have reported on the same 
behavior in the Chinese. 

Shirley (1013) measured manifestations of irritability in infants during 
physical, anthropometric, and psychological examinations. An elimination 
of screaming, crying, fussing, etc., is shown with age during the early days, 
weeks, and months. Timed records of laughing, crying, etc., were used 
by Bridges (893) as measures of emotionality in infants. 


THE PATTERNING OF CHARACTER AND PERSONALITY MEASURES 
Achievement 


The unique contribution of character and personality data to the pre- 
diction of achievement has been the subject of a number of investigations. 
The negligible relations between symptom questionnaires and achieve- 
ment have been noted in Chapter VI. These are in contrast to the definite 
relations found by the use of action criteria. 

Sorenson (1014) made a correlational analysis of the relationship 
existing between academic grades, industrial grades, intelligence, mechani- 
cal ability, mechanical interests, and problem tendency scores on Schedule 
B of the Haggerty-Olson-Wickman Rating Schedules in a study of junior 
high-school children. He found the highest correlation between the intel- 
ligence test and average grade (.62), with the relation to Schedule B 
scores next (-—.55). Marks could be predicted three semesters in advance 
with Schedule B, with only slightly less accuracy (—.51) than that for the 
semester in which ratings were secured. Schedule B scores, academic 
grades, and paper form board scores predicted industrial grades about 
equally well. By partial and multiple correlation technics, Sorensen con- 
cluded that scores on Schedule B give a unique contribution to the predic. 
tion of school marks in a junior high school. 

Turney followed his intensive monograph report (1023) on the relation. 
ship between character traits and achievement with a series of articles. 
In general, he concluded that between marks and the traits of industry, 
perseverance, dependability, and ambition, the correlations are as high 
or frequently higher than the correlations between I. Q. or mental age and 
marks. The correlations between traits and marks are but little affected 
when intelligence quotient or mental age is held constant. He held that 
the ratings are real measures of aspects of the personality which contribute 
to achievement. Intercorrelations between ratings and intelligence, vocabu- 
lary, scholarship, honesty rating, and objective test rating were made by 
Garrison and Howell (930). Reliable positive correlations were found 
between the various character traits and scholarship. 
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Hicks and Hayes (950) used a coded time-sampling method to record 
the verbal responses of junior high-school pupils in classroom discussions. 
The verbal responses were classified in terms of their content in subject- 
matter and the extent to which they represented desirable and less desirable 
aggressiveness. These values were then compared for four groups of chil- 
dren made up on the basis of combined ranks in general intelligence, 
school achievement, and general personality traits. Command of subject- 
matter and desirable aggressiveness were characteristic of the higher 
groups, and lack of command of subjectmatter and undesirable aggressive- 
ness were characteristic of the lower groups. 

McElwee (978) contrasted accelerated, normal, and retarded children 
in a public school on a checklist filled out by teachers. On the average, 
the accelerated children seemed to possess a greater degree of all the 
desirable traits than did the retarded children. Laycock (970) contrasted 
superior and inferior children on ratings of maladjustments made by 
teachers and ratings made by himself on the basis of personal interviews 
with the children’s parents. In general, the superior group received more 
desirable ratings by both approaches. 

The trend in secondary schools has been confirmed by Hartson (942) at 
the university level. He gathered data for 500 students on college freshman 
grades, high-school grades, psychological examinations, and ratings for 
eleven items in a personal rating scale. The eleven items had been rated 
by the students’ high-school principal, a teacher, and a friend. Personality 
ratings correlated better with scholarship in high school than in college, 
and ratings by the principal correlated better than those by the other 
judges. The personal estimate as a whole yielded higher correlation with 
college grades than did high-school grades or intelligence test scores. A 
combination of high-school grades, intelligence scores, and personal esti- 
mate proved to be a better measure for predicting college scholarship than 
any one or a combination of two of the above items. 

Ratings of twenty-five students by twenty-five others for beauty or phys- 
ical attractiveness were found to correlate only very slightly with scholar- 
ship and not at all with intelligence in an investigation by Mohr and 
Lund (988). 

It would appear that most external approaches to the evaluation of con- 
duct and personality make a unique contribution to the prediction of 
general educational achievement. The evidence suggests the existence of 
a basic patterning and a possible central relationship in which achieve- 
ment is but one of a number of peripheral expressions of the adjustment 
of the organism. 


Complex Interrelationships 


Typical studies in the patterning and interrelationships of behavior 
involve measurement by short-time samples in social situations for a series - 
of definable aspects of the total, including such matters as language, type 
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of participation, amount of physical contact, etc., with further analysis by 
sex, age, and other measurable characteristics (878, 884, 935, 994, 1000). 

Bott (890) gave intensive consideration to the problem of interrelation- 
ship in observational studies of social and material activities, verbal and 
motor activities, social relations among children, relations with adults, 
and personal activities. 


THE MODIFICATION OF INFORMATION AND CONDUCT 


Applications of measures of character and personality in the determina- 
tion of the result of both direct and indirect approaches to the modification 
of character are increasing’in number. The extent of this trend is probably 
not entirely apparent in current publications. An examination of lists of 
unpublished master’s theses shows large numbers of small studies. 


Reviews 


The obligation of the present Review for a report on applications is 
reduced since the December, 1934;“issue, devoted to psychology and 
methods in the high school and college, has a chapter on moral and char- 
acter education by Symonds and Kirkendall. Some overlapping exists in 
the present account where desirable for the sake of continuity. Changes 
in attitude are discussed in Chapter VII of the present number of the Review. 

General summaries of practices with technical evidence are available 
through bulletins of the Research Division of the National Education 
Association (993) and through yearbooks of the Department of Super- 
intendence (992) and the Department of Classroom Teachers (991). 
Heaton (947) has produced a convenient exposition for practices and 
sources. 


Direct Instruction 


Clevett (904) secured positive findings in increased honesty in athletic 
contests (used as test situations) in an experimental group in which 
experiences had been utilized as the basis for a discussion of problems of 
character. The control group had a very formal program with no conscious 
character education. Zyve (1033) called attention to the somewhat specific 
character of the outcomes in honesty which accrue through the utilization 
of examination situations for instruction in integrity. 

The work book by Charters, Rice, and Beck entitled What’s the Right 
Thing To Do? was experimented with as direct instructional material by 
Cressman (911). The one group of seventh-grade pupils used the work 
book for self-study, a second with presentation and discussion of the prob- 
lems by the teacher, and a third was used as a control. Character tests from 
the work book and from the Character Education Inquiry were given before 
and after the training period. Both instructed groups showed improvement 
and the one with the heaviest reliance on the work book ranked the highest. 
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Fifteen periods of thirty minutes each devoted to a study of principles of 
honesty produced some differences in information as contrasted to a 
central group in Hobson’s study (952). 

An increase in critical thinking was demonstrated by Biddle (887) as a 
result of a study by pupils of methods used in current propaganda. Experi- 
mental and control groups were used from the eleventh, twelfth, and thir- 
teenth grades. Tests on gullibility were given before and after. A significant 
improvement was found in the experimental group as compared to the 
control group. 

Robb and Faust (1009) found small but consistent differences in favor 
of an experimental group as a result of eight weeks’ instruction in ethics 
in one section of a civics course as compared to another which had the usual 
instruction in problems of democracy. A second experiment with two groups 
of ninth-grade children, one of which received eighteen home-room pro- 
grams devoted to the presentation and discussion of moral problems, re- 
sulted in inconsistent results. Lectures and conferences on the qualities 
and technics of leadership produced increased gains in experimental as 
contrasted to control groups in the investigations of Eichler and Merrill 
(918). Leadership was measured by the ratings of classmates. 


Indirect Instruction 


Participants in athletics achieved greater gains than spectators or non- 
participants in six tests from the Character Education Inquiry in one 
school, but failed to be differentiated from non-athletes by teacher ratings 
in two others (938). 

Allen (875) analyzed the character outcome accruing in two methods 
of teaching plane geometry. One group was taught by the traditional recita- 
tion method and the other by individualized instruction. He concluded 
that the individual instruction group was definitely superior at the end 
of the year in mathematical achievement, with small but consistent gains 
in the direction of more emotional stability, extroversion, submission, self- 
sufficiency, honesty, broadmindedness, and less mathematical interest. 
Students of Latin scored higher in civic attitudes, pacifism, and liberalism 
in the investigation by Meek (983). 

Mayberry (982) compared members of student councils in high schools 
with a similar number of paired students not members of the council. 
The results indicate that the council group was superior in the character- 
istics measured by the Upton-Chassell Citizenship Scale. 

The relation of newer practices in schools to character and other out- 
comes was described by Wrightstone in a series of studies. Experimental 
practices in the social studies, for example, may be employed to develop 
students in the direction of liberal civic attitudes and beliefs (1030). A 
number of articles have been devoted to methods of appraisal (1029). The 
monograph report, which is now available, falls properly in the period 
of the next cycle of the Review. 
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Motivation 


Cooperation and honesty tests were given to 215 pupils in grades seven 
and eight under both personal and social motivation (981). In general 
the subjects worked at a higher efficiency and were more deceptive when 
the score was to count for personal gain than when it was to count as a 
gain for the group. The same trend was reported by Forlano (927). 


Effect of Organization Membership 


Boy Scouts were not delinquent as often as non-Scouts in Fairchild’s 
investigation (922) of 500 Scouts and 500 non-Scouts from ten communi- 
ties. Scouting, however, cannot be credited with the differences since they 
tend to disappear when an equation for socio-economic factors is made. 
The same trend appears in ratings based upon twelve traits of character 
included in the Scout Law. The Fairchild investigation has been sum- 
marized by Dimock (913) with suggestions for desirable extensions. 

Feder and L. W. Miller (923) attempted to evaluate certain phases of 
a comprehensive program of character education for boys. The “X” plan 
involved military training and an intensive program of active club work 
in which the children had participated for a period of from two to five 
years. Comparisons were made with health tests, citizenship tests, attitudes 
toward war, and behavior ratings. Boys trained under the “X” plan did 
not differ materially on the above measures from boys in general. 


Summary 


A detailed reading of the experimental studies on the modification of 
information and conduct impresses one with the dominance of individual 
differences over experimental factors operating through brief periods of 
time. Differences in persons at the beginning of experiments are usually 
greater than changes which can be expected. The experimental differences 
secured are often small and specific or Vidiibheent Ee evidence has ap- 
peared which demonstrates that character or personality can be easily or 
rapidly modified. Some shifts in attitudes and information have been 
demonstrated with direct and indirect approaches. Future investigations 
should strive for larger samples, more complete appraisals of changes in 
information, attitudes, and conduct, and longer periods for the operation 
of the experimental variables. 7 
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